Mapping soil properties in a poorly-accessible area

Soil maps are important to evaluate soil functions and support decision-making process, particularly for soil properties such as pH, carbon content (C), and cation exchange capacity (CEC), but the spatial resolution and soil depth should meet the needs of users. On another hand, the efficiency of statistical models to create soil maps, with an acceptable level of accuracy, often require a large number of samples with an appropriate distribution across the area of interest. However, accessibility for sampling can be a trouble in remote areas, such as the Itatiaia National Park (INP). The hypothesis of this work is that it is possible to obtain a viable result in soil mapping of areas with limited access by using DSM tools. The general objective of this paper was to create 2and 3-D maps of the soil properties pH, carbon content, and CEC, with the correspondent spatial uncertainty, in the INP plateau. The sampling strategy was designed using conditioned Latin Hypercube Sample (cLHS), and different methods were tested to produce the soil properties maps. For calibration of the models: linear (MLR, multiple linear regression) and nonlinear (GAM, Generalised Additive Models). The results showed differences in predictive performance for all statistical methods and covariate selection approaches. The GAM, with covariates selection based on soil formation factors, was the best method for the limited number of soil samples. The greatest uncertainty was associated with areas with the lowest accessibility and, consequently, with low sampling density and/or noises in covariates. Even though the 2and 3-D maps of soil properties, each associated with explicit uncertainty, can contribute to the INP decision makers/ managers by providing information not available before.


INTRODUCTION
Soil is a vital part of the natural environment and it has a crucial role in ecosystem functioning (Adhikari and Hartemink, 2016). The soil functions can be derived from interactions of soil properties. To predict soil properties to assess their functions can provide detailed spatial information particularly useful in complex mountain terrain (Jeong et al., 2017). Soil information is an extremely important factor for conservation and sustainable management, and is essential in the formulation of sustainable agricultural policies and monitoring impacts of inappropriate use of resources (Carvalho Júnior et al., 2016), especially in mountain areas.
A relatively new approach that gives useful soil information is the 3-D modeling of soil properties. The modeling of soil properties in three dimensions has been evaluated in several studies (Kidd et al., 2015;Mulder et al., 2016;Amirian Chakan et al., 2017), including the assessment of associated uncertainty Poggio and Gimona, 2017a, b). With the progress of digital soil mapping (DSM), there is rising use of 3-D modeling to provide information on soil patterns for applications, from agricultural management to ecosystem services (Zhang et al., 2017) and so on. In terms of the evaluation of predicted results besides the estimation of the errors, such as Accuracy and kappa in a categorical map and R2, RMSE, ME in the continuous map, it is important to evaluate the uncertainty associated with the prediction. This is another important information to guide land management choices and decisionmakers (Poggio and Gimona, 2017a).
In recent years, there was a considerable advance in DSM due to new approaches, among them, powerful predictive algorithms (Beguin et al., 2017;Sindayihebura et al., 2017); models combining machine-learning and geostatistical tools (Poggio and Gimona, 2017a,b); expert knowledge-based methods (Menezes et al., 2014(Menezes et al., , 2018; and high-resolution soil maps (Nussbaum et al., 2017). However, the limiting factor is often the number of soil data used for model calibration (Samuel-Rosa et al., 2015;Somarathna et al., 2017). It was suggested that more data was more important than a better model (Somarathna et al., 2017). However, obtaining more data can be a problem because of the size and/or the accessibility of some test areas.
To facilitate DSM in poorly-accessible areas, Cambule et al. (2013) proposed a methodology of sampling in the area of greater accessibility, which is representative of the total area, and to evaluate the representativeness using, e.g., the similarity between the histogram of the covariates for the total and accessible areas. Other studies considered the costs of accessibility in soil sampling (Roudier et al., 2012;Carvalho Júnior et al., 2014;Stumpf et al., 2016) using a variation/optimization of the method known as conditioned Latin Hypercube Sampling (cLHS), proposed by Minasny and McBratney (2006). The cLHS is a robust tool for the allocation of sampling points by means of a set of auxiliary covariates. The idea is to be able to capture the maximum of soil variation, and its properties, by using environmental covariates as auxiliary information.
The Itatiaia National Park (INP) (Brazil) has limited access due to the steep slopes, dense forest cover, rocky outcrops, and altitude vegetation fields in the upper plateau (Barreto et al., 2013), all that making the INP an excellent case study. To obtain a viable result with a low cost, it is important to use DSM tools, ranging from optimization of the sampling site (Minasny and McBratney, 2006;Roudier et al., 2012;Stumpf et al., 2016) to the covariate selection using powerful predictive algorithms (Beguin et al., 2017;Jeong et al., 2017). Based on the above and considering the advancements and challenges of DSM, it is necessary to optimize financial and human resources to produce quality information that can be useful for decision-makers.
The hypothesis of this work is that it is possible to obtain a viable result to map soil properties in areas with limited access by using DSM tools.The general objective of this paper was to create 2-and 3-D models of soil properties (pH, carbon content, and cation exchange capacity), with the correspondent spatial uncertainty, with a resolution of 25 m and in a poorly accessible area of the INP plateau. For that, it was necessary to design a sampling strategy that balanced accessibility, costs, area, and environmental covariates; and to model soil properties, with the number of samples available. A goal of this study was to contribute with information for the management plan of the INP, regarding the soil properties studied and their potential relationship to the ecosystems in the park.

Study area
The INP has an area of 225.54 km 2 , and it is located in the Serra da Mantiqueira, the border region between the Minas Gerais (MG) and Rio de Janeiro (RJ) States (Barreto et al., 2013). According to Tomzhinski et al. (2012), the INP can be divided into three broad areas: the "Lower part", which comprises the southern part of the park, the "upper part" of the plateau ( Figure 1) and Visconde de Mauá in the east side.

Data sources and environmental covariates
The environmental covariates used to model the soil properties were derived from three data sources: a digital elevation model, remote sensing data (orbital image), and a geology map (Table 1). They were chosen to describe the main soil-forming factors, according to the scorpan approach (McBratney et al., 2003).

Digital Elevation Model (DEM)
The DEM used, with a spatial resolution of 25 m, was generated from contour lines, with 20 m equidistance, and hydrography extracted from plani-altimetric charts, both in the

Satellite image
Two scenes from the RapidEye sensor were used, scene 1 (from 02/07/2011) and scene 2 (16/08/2011). The two scenes were used to cover the entire study area. They have a 12-bit radiometric resolution, 6.5 m spatial resolution, and were orthorectified to 5 m spatial resolution (RapidEye, 2012). Both images were atmospherically corrected using the 6S model (Vermote et al., 1997).

Geology map
It was obtained from Santos et al. (2000), and it was scanned, vectorized, and georeferenced. The file was rasterized at the same spatial resolution as the DEM (25 m).

Soil dataset and sampling strategy
The soil dataset used for the DSM modeling had a total of 90 soil profiles, being 359 horizons with a morphological description, and 346 horizons with analytical data.
The approach used in this study to select the sampling points followed the principles proposed by Minasny and McBratney (2006). The points were selected by using cLHS with auxiliary covariates, and considering the access costs (Roudier et al., 2012;Carvalho Júnior et al., 2014;Stumpf et al., 2016). As a constraint, three buffer sizes were created in relation to roads and trails, as proposed by Carvalho Júnior et al. (2014). After testing the distances of 100, 200, and 400 m, with no significant differences between buffers (data not shown here), the distance of 100 m was selected to represent the accessible area.
The auxiliary covariates used to select sampling locations were: geology, elevation, slope, northernness, and soil-adjusted vegetation index. From the 90 points, 74 were selected for soil profile description. Also, legacy data (Soares et al., 2016) and data unpublished from field trips in the area were added to the database. In addition, other samples were selected based on the expert pedological knowledge and the relationship between soil and landscape, as recommended in the methods for conventional soil surveys (IBGE, 2015). These additional data (n=16) were obtained from places inside and outside the area in which the sampling was considered of higher accessibility (buffer of 100 m).

Covariates selection approach
To evaluate the relationships between soil properties such as pH, total carbon content (C), and cation exchange capacity (CEC), and environmental covariates, Multiple Linear Regression (MLR) and Generalized Additive Models (GAM) were tested ( Figure 2). The MLR is a parametric method, and it assumes that the relationships between dependent variable and covariates are linear (Hastie et al., 2009). The GAM is a flexible statistical method that may be used to identify and characterize nonlinear regression effects through smoothing functions (Hastie et al., 2009;Wood, 2017).
The selection of the covariates was carried out to produce simpler models with the minimum number of covariates, and still able to explain the maximum of the data variability. Different strategies were used and they are described below.
Step one -correlation cut off: it evaluated the correlation between covariates. If two covariates had a coefficient greater than 0.85 (the cutoff value considered for this study), only one was maintained. The covariate used in the model was presumed to have a greater relationship with the SCORPAN (McBratney et al., 2003) model, i.e., greater pedological information.
Step two -selection using different approaches: it involved fitting models using the covariates maintained in the first step. But whereas fitted MLR models were fitted with all Rev Bras Cienc Solo 2020;44:e0190107 covariates, the same was not possible for GAM, due to limitation of degrees of freedom for many covariates and the few soil samples (Poggio et al., 2013). The purpose of the model with all covariates is to have a basis of comparison with different methods of selection commonly used. This last has recently been used in soil science for variable selection in machine-learning algorithms, and it is a backward selection using rank (Jeong et al., 2017;Vašát et al., 2017). The backward selection algorithm iteratively eliminates the least promising predictors from the model based on an initial predictor importance measure. When the full model has created a measure of variable importance is computed and shows the ranks of predictors from most to least important (Kuhn, 2017).
GAM: the approach was different due to the degree of freedom limitation. In this case, the models are penalized by the low number of points (soil data) and a large number of covariates, i.e., large parametric parameters of the GAM model. The models were fitted based on the stepwise forward approach, where covariates are added according to AIC. All models began with geographic coordinates (X, Y) and geology as fixed covariates. Three models were fitted. The first using the base model, where each covariate was added in the base model individually, and then evaluated by its AIC. The model ran with all covariates, and the four with lower AIC value composed the final model termed GAM_one. The second model consisted of making all possible combinations of four covariates and then run the model with all possible combinations. The combination with a smaller AIC was termed GAM_comb. This approach seeks to capture the interaction between covariates when a predictor model is fitted. In both cases (GAM_one and GAM_comb), it was included as many covariates as possible; since the base model already had nine covariates X, Y, and seven different levels for geology, it was possible to include another four covariates totaling a model with a maximum of 13. The third model involved a more parsimonious model based on the scorpan approach (McBratney et al., 2003). In this case, in addition to the base model that already included the parent material (geology) and spatial position (X, Y), for 2-D prediction and geology, and X, Y and depth (Z) for 3-D prediction, different combinations of data derived from the satellite image (in this way adding the factor organism associated to the land coverage) and data derived from the DEM (mainly represent factor relief, topography) were tested.
In all, possible combinations were tested for each soil property using external validation, but the same procedure was repeated using cross-validation. The best model in both evaluations was selected and termed GAM_scorpan.
The GAM model was selected for the 3-D approach due to its simplicity, and being a flexible approach that can deal with both linear and non-linear relationships between soil properties and the considered covariates (Poggio and Gimona, 2017a). Also, the 3-D smooth can provide a better performance, considering non-linear relationships between covariates and soil properties (Poggio and Gimona, 2017a), which are frequent when modeling natural environments.
For all combinations of models and covariate selection, an extension of the scorpan-kriging approach, hybrid geostatistical model, i.e., GAM-kriging or MLR-kriging was tested, combining the models with spatially correlated errors. However, the short-range spatial structure combined with the sparse sampling along the roads and tracks led to the fact that the errors do not show spatial dependence. Thus, this analysis was not fully accomplished.

Validation and uncertainty
The model's performance was evaluated in two ways: the first by external validation, where points selected by the cLHS n = 74 soil profiles (Minasny and McBratney, 2006) were used to fit the models. To validate the performance of the models, there were used n = 16 profiles, from the legacy data (retrieved from the literature), as well as extra points collected in the field based on the pedological knowledge and the soillandscape relationship (without pre-selection). In training, samples were taken within a 100 m buffer in relation to roads and tracks. The validation samples include points inside and outside the buffer, defined as accessibility criterion; the second form of evaluation was leave-one-out cross-validation (LOO-CV) . In both cases, the Mean Square Error (MSE) and Root Mean Square Error (RMSE) were computed.
Rev Bras Cienc Solo 2020;44:e0190107 And a coefficient of determination was derived from a linear model between observed and predicted data (R 2 ). For 3-D soil modeling, the results of the modeling were summarized for the whole profile and at five depth layers, according to Global-Soil Map project specification (Arrouays et al., 2014), and compared with observed values from corresponding depths. Uncertainty propagation was analyzed through simulation (N = 1000) from the posterior distribution of fitted GAMs to derive simultaneous confidence intervals for the derivatives of penalized splines (Ruppert et al., 2003). All algorithms implementation, spatial prediction, and uncertainty analysis were done using the R program (R Core Team, 2018).

Correlation analysis between covariates
Strong relationships were observed between covariates derived from the satellite image, most of them with a correlation greater than 0.85 (Figure 3). They contributed in a similar way as information of vegetal coverage or land use, and their use may impair the model's fitness due to multicollinearity problems. The covariates CHND, band1, band2, band3, and SAVI were excluded, due to a correlation greater than 0.85 with one or more covariates. The CHND showed a strong correlation with elevation values.
It is usually necessary to decrease the number of covariates in the GAM models, especially when there is limited number of soil samples, as in this study. Since there was no high relation (no greater than the cut-off value 0.85) between covariates derived from satellite image and DEM (Figure 3), all possible combinations were tested to build the scorpan GAM model.

The 2-D approach
For the model's comparison by soil properties, for pH prediction using linear models, the best method of covariate selection was the RFE, in both cases, using external and cross-validation (Table 2). However, this pattern was not repeated for the other soil properties, C and CEC (Tables 3 and 4). For the total soil C and CEC, the RFE method presented the worst performance for the linear model (Table 2). In the case of the linear model, the most commonly used method for covariate selection, the stepwise method increased the R 2 coefficients when compared to other linear methods, for both validation methods -external validation and cross-validation, for C and CEC prediction and also RFE for pH.
For the GAM models, the best method of covariate selection was the scorpan approach, in both cases, using external, and cross-validation (Tables 2, 3, and 4). It was also the best model when compared with different approaches to select covariates in MLR models (Tables 2, 3, and 4). The scorpan model remained the best approach for covariate selection in GAM models.
For CEC prediction, all the best models in each approach had better performance when used external validation for evaluating. However, the validation samples are not completely random, and this may overestimate the model's performance for the external validation approach. Regardless of the differences, the best model selected in the external validation was also the best model in the cross-validation.
About the spatial prediction and uncertainty propagation, the predicted values (in the grid) and the observed values using the best covariate selection approach for MLR and GAM models are presented in the figure 4. For carbon content, the extreme values (minimum and maximum) were similar to the prediction made by MLR, with maximum values close to measured values, but the minimum showed negative values. The uncertainty in the predictions of soil properties for the superficial layer was mainly associated with extrapolation of values for regions not sampled, and the INP boundaries with greater limitation of access (Figure 4).

The 3-D approach
On the continuous depth function using GAM scorpan, based on results for topsoil layer prediction, the GAM scorpan approach was chosen to predict the soil properties for the whole profile. In this case, besides the base model of 2-D GAM with covariates X, Y, and geology, the soil depth (Z) was added as a covariate in the base model to create a smoother 3-D. As with 2-D modeling, the base model was used to test different combinations of properties derived from DEM and satellite images. Since most of the soils in the INP have shallow profiles, it was considered for prediction the maximum depth limit of 1.00 m. A greater soil profile depth than that represents less than 23 % of the total data ( Figure 5).
For model evaluation, it was used the descriptive statistic for the whole soil profile, and predictions for soil pH are very close (Tables 3) to observed values (Table 3), especially when using the cross-validation approach ( Table 3). The values of the determination Rev Bras Cienc Solo 2020;44:e0190107 coefficient for carbon content and CEC are higher among observed and predicted values than for the pH, especially in cross-validation (Table 3). The magnitude of errors, RMSE, and MSE shows a tendency to extrapolate low carbon contents.
Although there is a positive relationship between predicted and observed carbon values, they decrease in depth ( Figure 5) and there is a tendency for very low values for depths greater than 0.30 m (Table 4). This is especially true for soils that begin with low levels of soil carbon, such as mineral soils. Particularly in the deeper layers, the low number of points to represent these layers affected the prediction (Tables 4). The form of the model evaluation, external data or cross-validation, leads to different results by layer. For cross-validation, in which better results were obtained, the best performance was 0.05-0.15 m, for pH and C, and 0.60-1.00 m for CEC (Table 4).
On the spatial prediction and uncertainty propagation, when comparing the spatial prediction of the models, the cross-validation showed better results, so these models   were used to predict the soil properties on the grid and to produce the maps (Figure 7). Some extrapolated values for C and CEC are affected, among other factors, by the access limitations; consequently, we had a smaller number of points to calibrate the models and limited spatial distribution of the soil samples.

2-D approach
The performance gain in MLR models, already in the selection of covariates by correlation, excluding those with highly correlated (Figure 3), is probably due to the multicollinearity problem (Kempen et al., 2009;ten Caten et al., 2011), which significantly affects the model performance. The MLR with several covariates showed a tendency to have the worst performance because of its effect of harmful multicollinearity in the parametric models; thereby impairing the model. This leads to the problem of inflating the variance of parameters, model over-fitting, and even noise problems (Kempen et al., 2009;ten Caten et al., 2011;Nussbaum et al., 2018). This is especially important if we have a limited number of samples and a large number of covariates.
The MLR model presented regular performance, and it was worse than GAM (Table 2).
In the RFE used in Random Forest, as it was done Jeong et al. (2017), the best results can be due to the parameters optimization using cross-validation inside the algorithm (using caret package), thus selecting an optimal value of the number of trees and mtry parameter. This pattern was not observed in the MLR using RFE. For linear models, the RFE selection method (using the RFE lmFuncs function on caret package) did not present good results; it was the model with the worst result, except for soil pH ( Figure 5). Furthermore, it almost always selects the same covariates, regardless of the soil properties tested. This suggests that selection algorithm using the function (rfe) for linear models in the caret package should be used carefully. In general, the best way to select covariates for MLR is the stepwise selection with AIC criteria, a common method to select models in linear regression Chagas et al., 2016;Vermeulen and Niekerk, 2017).
The GAM_scorpan was the most appropriate model for the prediction of all soil properties. It presented the best performance, in both ways, when evaluated using external validation and with cross-validation. However, this was the only model where, in all soil properties, performance in the cross-validation was lower than in the external validation (Tables 3 and 4). This may be due to the fact that external validation samples are not probabilistic, as suggested by Brus et al. (2011), and do not include all geographic and attribute space. For these conditions, where the validation samples come from various sources and are not completely random, and there is a limited set of soil data, cross-validation is the best approach to evaluate the models.
For the three soil properties and selection methods evaluated in this work, the linear models showed inferior performance to GAM. This is probably because relationships between soil attributes and covariates are not linear, and models such as the MLR fail to capture the nonlinear relationships efficiently (Poggio et al., 2013;Jeong et al., 2017).
Since, in general, soil properties do not have linear relationships with environmental covariates, the models that captured these relationships tend to be better. In contrast, GAM models, where it is possible to model nonlinear relationships (Jeong et al., 2017;Poggio and Gimona, 2017 a,b), had the best performance when combined a nonlinear modeling approach and the concept of soil formation factors for covariate selection (expert knowledge).
When analyzing the models separately according to their respective methods of covariates selection, it is observed that, in general, the best results use stepwise selection for MLR (except for predicting soil pH), and GAM using the scorpan approach. It is possible to separate the models and covariates selection approach. This appears to contradict Somarathna et al. (2017), who suggested investing in sampling ( Figure 4) rather than more robust models. However, it agrees with Beguin et al. (2017), who tested different statistical approaches and found significant differences, thus suggesting that robust methods can enhance DSM capabilities and support existing efforts for improving digital soil products, even with limited data.
Pedological knowledge is crucial in DSM and was used by Nussbaum et al. (2018), to exclude covariates with low spatial variation and aggregate levels of categorical variables with low sample density per level. The knowledge of soil-forming factors as well as of the study area is a powerful tool, and when associated with computational tools, it may improve predictions of soil properties and classes.
When evaluating soil-forming factors and pedological elicitation, the elevation, parental material, and covariates from the RapidEye sensor (Table 1) were the factors that most influenced the soil properties of the INP plateau, because they were frequently selected by the different approaches of covariate selection for both models, MLR and GAM. All soil factors are related, for example, the relief has spatial variation, and the highest part is in the center of the study area; in turn, the elevation influences the weather, that is cold and wet in the INP, and that leads to a distinct distribution of plant species. This environment favors accumulation and preservation of soil organic matter due to low temperatures, leading to the formation of the organic soils in the high altitudes (Soares et al., 2016), which explains why the organic soils predominate in the upper part of the INP.
This agreed with the results found in the GAM_scorpan models, the models selected as the best since they combine the most important covariates, related to the parent material, relief, organism, and position in the geographic space. The combined influence of these factors is the main reason for soil formation in the upper part of the INP. For example, the higher carbon and CEC contents, properties strongly related to each other, were predicted with highest values in the areas of INP with altitudinal fields coverage (Figure 1) (predominant species are Poaceae and Cyperaceae), which are concentrated in the plateau central region of INP, where the dominant geology is composed by quartz-syenites and related sediments (Santos et al., 2000;Soares et al., 2016).
When the predicted values (in the grid) and the observed values were compared, it was observed a tendency of the MLR to extrapolate results, especially lower values being more negative; as in the carbon and CEC ( Figure 5). Despite the best performance of the GAM model (Table 2), evaluated by the R2, RMSE, and MSE metrics, there was an extrapolation of values predicted in the grid. This reinforces the importance of evaluating the spatial propagation of uncertainty in DSMs (Stumpf et al., 2016;Vaysse and Lagacherie, 2017). Besides some geology classes occurred in small areas; consequently they had fewer soil samples. A similar pattern was observed by Cambule et al. (2014), predicting carbon stocks in the Limpopo National Park, where they observed high uncertainty values, and it was suggested that it was due to shortrange spatial structure combined with the sparse sampling.

3-D approach
Due to limited accessibility conditions and hence the reduced number of points, the best approach to fit and validate GAM models are smoothing functions using cross-validation. Also, because when there are many covariates and few points, it is not possible to fit GAM models due to the limitation on the degrees of freedom of the model (Poggio et al., 2013). In this case, with limited ground points, the cross-validation seems to be more appropriate to fit the 3-D function (Taghizadeh-Mehrjardi et al., 2016;Amirian Chakan et al., 2017).
Even if more data are available, this is a common approach in recent 3-D soil properties modeling (Veronesi et al., 2014;Mulder et al., 2016). As there are more samples for the topsoil, in general, this layer presented better prediction results as observed by Kempen et al. (2011), Mulder et al. (2016, and Poggio and Gimona (2017a) consequently, larger errors prediction and greater uncertainty are observed in the deepest depths (Figures 6 and 7). In this sense, LOO-CV is the best approach to obtain the results given that in depth the number of samples is even more limited.
The evaluation of spatial uncertainty propagation showed a greater uncertainty for the higher values given by the standard error, due to noise in the very steep areas (above 100 %) and/or shadowed regions detected in the satellite images. Also, areas that had greater uncertainty occurred when the predicted values are associated with higher access limitation, with little or no soil samples. As mentioned, in some areas of INP, the access is very limited, with unused/semi-closed tracks, or no tracks at all, for the North and West directions. In addition, no vehicles, except during an emergency, are allowed to travel inside the park. These reasons led to a low number or no soil sampling at all in some areas of INP (Figure 1). When the profile depth is considered, due to the dominance of shallow soils, the number of soil samples decreases, in the distance and with depth. Some covariates such as DEM also varied significantly (North and West). This contributes to the model's higher extrapolations and uncertainty.
In terms of soil genesis, due to the very steep slopes and the predominantly acidic nature of the parent material, the dominant soils in the INP are the Regosols and Cambic Umbrisols. Under the forest coverage, but with higher slopes, some intermediate developed soils such as Cambisols and Folic Umbrisols occur, with the Regosols occupying the strongly sloping and steep areas. In the higher elevation areas, with altitude fields vegetation and rock exposures, shallow soils such as Histosols and Leptosols with a histic horizon are dominant. In the lower elevations of INP, under forest coverage and located in flat and gently slopes, more weathered and deeper soils such as Rhodic Acrisols and Ferrasols predominate.
In summary, the INP plateau is relatively complex in terms of soil variation and their attributes and most soil properties predictions were produced with admissible modeling diagnostics and uncertainty ranges for LOO-CV (Table 4), when the limitation due to the number of soil samples and their spatial distribution is taken in account. For the remote areas of the INP plateau, the models tend to extrapolate the results for soil carbon content and CEC in deeper layers, especially the MLR in relation to GAM. The same results were observed by Kidd et al. (2015), suggesting that maps should be created with continuous improvements, from the input of newly collected data. Prediction uncertainty can help to choose supplemental sampling to improve the DSM (Li et al., 2016).

CONCLUSIONS
In general, the GAM model had superior performance MLR. The approach based on soil-forming factors showed to be a simple and viable method for covariates selection in the GAM model, especially considering limitations regarding degrees of freedom due to the limited number of soil samples. The elevation, parental material, and covariates from the RapidEye sensor were the factors that most influenced the soil properties of the Itatiaia National Park plateau.
The greater uncertainty of the maps was associated with the low accessibility areas, which had low sampling density and/or noises in the covariates. The 2-and 3-D soil properties modeling with the correspondent uncertainty propagation can be used for INP management of ecosystems.
The high resolution of soil attributes and uncertainties produced for INP in the 3-D space are an important step in developing a comprehensive soil database, carrying quantitative soil-information on a scale adequate to the INP demands.
The sampling planning using cLHS provided a convenient subset of the area representation, increasing the possible combination of ranges from the pool of covariates. This approach enhances the potential of the best model (GAM scorpan) to produce maps, by accounting with the uncertainty, confining sampling points to the absolutely necessary, especially in areas with limited access such as the INP. This strategy contributes to minimize costs when balancing challenges and sampling requirements.