Main content area

A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network

Hussein, Ahmed M., Abd Elaziz, Mohamed, Abdel Wahed, Mahmoud S.M., Sillanpää, Mika
Journal of hydrology 2019 v.575 pp. 852-863
algae, calcium, cost effectiveness, drinking water treatment, fuzzy logic, monitoring, moths, neural networks, nitrates, pH, processing time, support vector machines, surface water, water quality, Egypt
Here, we propose a new alternative machine learning method that combines the advantage of the Random Vector Functional Link Network (RVFL) with Moth Search Algorithm (MSA) to predict the missing values of total algal count during water quality monitoring of surface waters that providing drinking water treatment plants in Fayoum, Egypt. Total of 34 water quality parameters was measured in 270 water samples during the period 2015–2017. The MSA algorithm was used for optimal selection of input features to improve the performance of the RVFL. The predicted missing values of the total algal count, by the proposed MSA-RVFL method, were strongly correlated with the real observed ones. The results of the MSA-RVFL were better than the Support Vector Machine (SVM) and the Adaptive Neural Fuzzy Inference System (ANFIS) models at different sizes of training tests. Compared with GA-RVFL and PSO-RVFL methods, using the MSA-RVFL was relevant to minimize the input variables and to reduce the processing time. The MSA-RVFL model could reduce the number of input variables from thirty-four to eighteen and eventually to four variables. The most significant variables selected by the MSA-RVFL to predict the total algal count were pH, NO₃, P and Ca. Based on these four variables, the predicted values of algae were significantly matching with the real observations (R² = 0.9594). Accordingly, this makes the MSA-RVFL model to be a useful cost-efficient tool during water quality monitoring programs. Finally, the MSA-RVFL showed higher performance whenever the number of inputs is large or small that gives our suggested method more advantages than the traditional ANN models.