Rev. Bras. Ciênc. Solo.2021;45:e0210084.

Optimized data-driven pipeline for digital mapping of quantitative and categorical properties of soils in Colombia

Alejandro Coca-Castro ORCID logo , Joan Sebastián Gutierrez-Díaz ORCID logo , Victoria Camacho ORCID logo , Andrés Felipe López ORCID logo , Patricia Escudero ORCID logo , Pedro Karin Serrato ORCID logo , Yesenia Vargas ORCID logo , Ricardo Devia ORCID logo , Juan Camilo García ORCID logo , Carlos Franco ORCID logo , Janeth González ORCID logo

24/Nov/2021

DOI: 10.36783/18069657rbcs20210084

Graphical Abstract

Graphical Abstract

Highlights

We propose a toolbox facilitating the workflow of a digital soil mapping project.

The toolbox was tested across a relatively large area of 14,537 km2 in Colombia.

The results confirm derived products offer the robustness required for a DSM project.

Parallel processing increases toolbox’s performance for covariates selection and modeling steps.

Optimized data-driven pipeline for digital mapping of quantitative and categorical properties of soils in Colombia

ABSTRACT

Soil maps provide a method for graphically communicating what is known about the spatial distribution of soil properties in nature. We proposed an optimized pipeline, named dino-soil toolbox, programmed in the R software for mapping quantitative and categorical properties of legacy soil data. The pipeline, composed of four main modules (data preprocessing, covariates selection, exploratory data analysis and modeling), was tested across a study area of 14,537 km 2 located between the departments of Cesar and Magdalena, Colombia. We assessed the feasibility of the toolbox to model three soil properties: pH at two depth intervals (0.00-0.30 and 0.30-1.00 m), soil taxonomy (great group) and taxonomic family by particle-size, according to a set of 25 environmental factors derived from auxiliary layers of climate, land cover and terrain. As a result, we successfully deployed the proposed semi-automatic and sequential pipeline, yielding rapid digital soil mapping (DSM) outputs across the study area. By providing multiple outputs such as tables, charts, maps, and geospatial data in four main modules, the pipeline offers considerable robustness to support outcomes and analysis of a DSM project. Future studies might be interesting to expand on further machine learning frameworks for predictive modeling of soil properties such as ensembles and deep learning models, which have shown a high performance for DSM.

Optimized data-driven pipeline for digital mapping of quantitative and categorical properties of soils in Colombia

Comments