Main content area

Comparing algorithms to disaggregate complex soil polygons in contrasting environments

Flynn, Trevan, van Zijl, George, van Tol, Johan, Botha, Christina, Rozanov, Andrei, Warr, Benjamin, Clarke, Cathy
Geoderma 2019 v.352 pp. 171-180
decision support systems, expert opinion, landforms, linear models, neural networks, prediction, soil associations, soil types, support vector machines, toposequences, uncertainty, South Africa
In South Africa, the only soil resource available with full spatial coverage is the national resource inventory. Disaggregating this polygon-based inventory, is thus a logical step to create more detailed soil maps covering the entire country. The polygons are large in area encompassing complex soil-terrain patterns and research into disaggregation techniques has been limited. This study aimed to compare 10 algorithms, implemented through a modified DSMART (“Disaggregating and Harmonizing Soil Map Units Through Resampled Classification Trees”) model, in their ability to disaggregate two polygons into soil associations in two environmentally contrasting locations. One site had high relief and strong catenal sequences (eastern KwaZulu-Natal Province) and the other site had low relief and a strong geological control of soil types (northern Eastern Cape Province). The algorithms compared were based on previous studies which included k-nearest neighbour, nearest shrunken centroid, discriminatory analysis, multinomial logistics regression, linear and radial support vector machines, decision trees, stochastic gradient boosting, random forest, and neural networks. The method involves stratifying the polygons with landform elements, randomly sampling the landform elements, allocating the soil classes based on the resource inventory, and predicting soil associations across a stack of covariates. This was done in an iterative process, creating multiple realisations of the soil distribution. The performance of each algorithm was based on their kappa and uncertainties. It was found that in general, robust linear models which either utilise an embedded feature selection or regularise covariates, performed best. In the area with high relief and clear toposequences, nearest shrunken centroid was the top performing algorithm with a kappa of 0.42 and an average uncertainty of 0.22. In the area with relatively low relief and complex geology, the results were unsatisfactory. However, a regularised multinomial regression was the top performing algorithm, achieving a kappa of 0.17 and an average uncertainty of 0.84. The results of this study highlight the versatility of a technique to disaggregate South Africa's national resource inventory, where algorithms can be chosen on expert knowledge, model averaging can be performed, the top performing algorithm can be chosen, and algorithm parameters can be optimised.