Main content area

Selection of significant input variables for time series forecasting

Tran, H.D., Muttil, N., Perera, B.J.C.
Environmental modelling & software 2015 v.64 pp. 156-163
computer software, data collection, environmental models, time series analysis
Appropriate selection of inputs for time series forecasting models is important because it not only has the potential to improve performance of forecasting models, but also helps reducing cost in data collection. This paper presents an investigation of selection performance of three input selection techniques, which include two model-free techniques, partial linear correlation (PLC) and partial mutual information (PMI) and a model-based technique based on genetic programming (GP). Four hypothetical datasets and two real datasets were used to demonstrate the performance of the three techniques. The results suggested that the model-free PLC technique due to its computational simplicity and the model-based GP technique due to its ability to detect non-linear relationships (demonstrated by its relatively good performance on a hypothetical complex non-linear dataset) are recommended for the input selection task. Candidate inputs which are selected by both these recommended techniques should be considered as significant inputs.