Main content area

Evaluation of predictive capabilities of ordinary geostatistical interpolation, hybrid interpolation, and machine learning methods for estimating PM2.5 constituents over space

Requia, Weeberb J., Coull, Brent A., Koutrakis, Petros
Environmental research 2019 v.175 pp. 421-433
Bayesian theory, algorithms, aluminum, artificial intelligence, copper, environmental health, exposure assessment, forests, geostatistics, iron, kriging, land use, lead, models, nickel, particulates, regression analysis, remote sensing, titanium, uncertainty, vanadium, variance, zinc, Massachusetts
Numerous modeling approaches to estimate concentrations of PM2.5 components have been developed to derive better exposures for health studies, including geostatistical interpolation approaches, land use regression models and, models based on remote sensing technology. Recently, there have been some efforts to develop models based on machine learning algorithms. Each one of these exposure assessment methods has inherent uncertainties resulting in varying levels of exposure misclassification. To date, only a few studies have attempted to systematically compare exposure estimates from different PM2.5 constituent models. Our research addresses this gap, by comparing the predictive capabilities of ordinary geostatistical interpolation (Ordinary Kriging – OK), hybrid interpolation (combination of Empirical Bayesian Kriging and land use regression), and machine learning techniques (forest-based regression) for estimating PM2.5 constituents in Eastern Massachusetts in the United States. We compared the estimates of 10 ambient PM2.5 components, which included Al, Cu, Fe, K, Ni, Pb, S, Ti, V, and Zn. The OK model performed poorest for all PM2.5 components, with an R2 under 0.30. The hybrid model presented a slight improvement, especially for Cu and Fe, for which the R2 value increased to 0.62 and 0.59, respectively. These elements presented the highest R2 value from the hybrid model. The forest model presented the best performance, with R2 values higher than 0.7 for most of the particle components, including Cu, Fe, Ni, Pb, Ti, and V. Same as observed with the hybrid model, the forest model for Cu and Fe explained the highest concentration variance, with a R2 value equal to 0.88 and 0.92, respectively. The forest model for K, S, and Zn performed poorest with an R2 value of 0.54, 0.37, and 0.44, respectively. The results presented here can be useful for the environmental health community to more accurately estimate PM2.5 constituents over space.