Main content area

A random forest partition model for predicting NO2 concentrations from traffic flow and meteorological conditions

Kamińska, Joanna A.
The Science of the total environment 2019 v.651 pp. 475-483
adverse effects, air, algorithms, artificial intelligence, models, nitrogen content, nitrogen dioxide, odors, pollution, prediction, relative humidity, temperature, traffic, trees, urbanization, wind direction, Poland
High concentrations of nitrogen dioxide in the air, particularly in heavily urbanised areas, have an adverse effect on many aspects of residents' health (short-term and long-term damage, unpleasant odour and other). A method is proposed for modelling atmospheric NO2 concentrations in a conurbation, using a partition model M consisting of two separate models: ML for lower concentration values and MU for upper values. An advanced data mining technique, that of random forests, is used. This is a method based on machine learning, involving the simultaneous compilation of information from multiple random trees. Using the example of data recorded in Wrocław (Poland) in 2015–2017, an iterative method was applied to determine the boundary concentration y˜ for which the mean absolute deviation error for the partition model attained its lowest value. The resulting model had an R2 value of 0.82, compared with 0.60 for a classical random forest model. The importances of the variables in the model ML, similarly as in the classical case, indicate that the greatest influence on NO2 concentrations comes from traffic flow, followed by meteorological factors, in particular the wind direction and speed. In the model MU the importances of the variables are significantly different: while traffic flow still has the greatest impact, the effects of temperature and relative humidity are almost as great. This confirms the justifiability of constructing separate models for low and high pollution concentrations.