Main content area

Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem

Yang, Ren-Min, Zhang, Gan-Lin, Liu, Feng, Lu, Yuan-Yuan, Yang, Fan, Yang, Fei, Yang, Min, Zhao, Yu-Guo, Li, De-Cheng
Ecological indicators 2016 v.60 pp. 870-878
topography, soil organic carbon, uncertainty, soil resource management, models, regression analysis, vegetation cover, environmental factors, carbon sequestration, topsoil, soil fertility, remote sensing, ecosystems, environmental indicators, climate, prediction, China
Soil organic carbon (SOC) plays an important role in soil fertility and carbon sequestration, and a better understanding of the spatial patterns of SOC is essential for soil resource management. In this study, we used boosted regression tree (BRT) and random forest (RF) models to map the distribution of topsoil organic carbon content at the northeastern edge of the Tibetan Plateau in China. A set of 105 soil samples and 12 environmental variables (including topography, climate and vegetation) were analyzed. The performance of the models was evaluated using a 10-fold cross-validation procedure. Maps of the mean values and standard deviations of SOC were generated to illustrate model variability and uncertainty. The results indicate that the BRT and RF models exhibited very similar performance and yielded similar predicted distributions of SOC. The two models explained approximately 70% of the total SOC variability. The BRT and RF models robustly predicted the SOC at low observed SOC values, whereas they underestimated high observed SOC values. This underestimation may have been caused by biased distributions of soil samples in the SOC space. Vegetation-related variables were assigned the highest importance in both models, followed by climate and topography. Both models produced spatial distribution maps of SOC that were closely related to vegetation cover. The SOC content predicted by the BRT model was clearly higher than that of the RF model in areas with greater vegetation cover because the contributions of vegetation-related variables in the two models (65% and 43%, respectively) differed significantly. The predicted SOC content increased from the northwestern to the southeastern part of the study area, average values produced by the BRT and RF models were 27.3gkg−1 and 26.6gkg−1, respectively. We conclude that the BRT and RF methods should be calibrated and compared to obtain the best prediction of SOC spatial distribution in similar regions. In addition, vegetation variables, including those obtained from remote sensing imagery, should be taken as the main environmental indicators and explicitly included when generating SOC maps in Alpine environments.