A Regional Legacy Soil Dataset for Prediction of Sand and Clay Content with Vis-Nir-Swir, in Southern Brazil
Legacy soil samples produced reliable information by VIS-NIR-SWIR spectroscopy approach.
The scatter-corrective preprocessing produced better performance than spectral-derivatives.
Preprocessing spectra prior to regression analysis does not improve sand prediction.
Clay content presented better prediction accuracy than sand content.
The best multivariate model to predict sand and clay from VIS-NIR-SWIR spectra was Cubist.
The success of soil prediction by VIS-NIR-SWIR spectroscopy has led to considerable investment in large soil spectral libraries. The aims of this study were 1) to develop a soil VIS-NIR-SWIR spectroscopy approach using legacy soil samples to improve spectral soil information in a regional scale; (2) to compare six spectral preprocessing techniques; and (3) to compare the performance of linear and non-linear multivariate models for prediction of sand and clay content. A total of 1,534 legacy soil samples, stored by Epagri, were collected from agricultural areas in 2009 on a regional scale, covering 260 municipalities of Santa Catarina. Six spectral preprocessing techniques were applied and compared with reflectance spectra (control treatment) in the development of sand and clay prediction models. Five multivariate regression models, Support Vector Machines, Gaussian Process Regression, Cubist, Random Forest, and Partial Least Square Regression were compared. The scatter-corrective preprocessing groups produced similar or better performance than spectral-derivatives. In addition, preprocessing spectra prior to regression analysis does not improve sand prediction, since reflectance spectra achieved the best performance using Cubist, SVM, and PLS models. In general, clay content presented better prediction accuracy than sand content. The best multivariate model to predict sand and clay content from soil VIS-NIR-SWIR spectra was Cubist. The best Cubist performance was achieved combined with reflectance spectra (R2 = 0.73; root mean square error = 10.60 %; ratio of the performance to the interquartile range = 2.36) and MSC (R2 = 0.83; root mean square error = 7.29 %; ratio of the performance to the interquartile range = 3.70) for sand and clay content, respectively. Considering the mean RMSE values of the validation set, the predictive ability of the multivariate models decreased in the following order: Cubist>PLS>RF>GPR>SVM for both properties. The predictive ability of VIS-NIR-SWIR reflectance spectroscopy achieved in this study for sand and clay content using legacy soil data and heterogeneous samples confirmed the potential of the spectroscopy approach.