Main content area

Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests

Ma, Jun, Cheng, Jack C.P.
Applied energy 2016 v.183 pp. 193-201
algorithms, education, energy, fuel oils, household income, households, linear models, local government, regression analysis, residential housing, transportation, urban planning, New York
Efficient and effective city planning in improving the energy performance of residential buildings requires a clear understanding of the influential features. Previous studies on modeling the relationships between influential features and the energy consumption have several gaps and limitations, such as the linear modeling methodology and insufficient consideration of particular features. This study therefore aims at investigating the influence of 171 possibly related features on the regional energy use intensity (EUI) of residential buildings using a non-linear regression algorithm, namely Random Forests (RF). The New York City (NYC) was focused on due to data availability. The 171 features covered seven different aspects, which are building, economy, education, environment, households, surrounding, and transportation. The average site EUI of the residential buildings in each Block Group (BG) was set as the dependent variable. The regression model was compared to the models using typical linear methods, such as Multiple Linear Regression and Lasso. The results show that the RF model achieved a lower mean square error. In addition, the top 20 influential features were identified based on the out-of-bag estimation in RF. Results show that less percentage of well-educated people, higher percentage of households heated by fuel oil, lower household income and more residential complaints per capita are correlated with higher average site EUI in NYC. Related suggestions on improving the energy performance in different regions are presented to the local government.