Main content area

German solar power generation data mining and prediction with transparent open box learning network integrating weather, environmental and market variables

Wood, David A.
Energy conversion and management 2019 v.196 pp. 354-369
Lampyridae, algorithms, artificial intelligence, data collection, market prices, markets, power generation, prediction, solar energy, solar farms, time series analysis, weather, Germany
A compiled dataset of hourly-averaged solar power generation (MW) for Germany in 2016 integrates eight influencing weather, environmental and market price variables (8784 data records including 5012 non-zero generation periods). It provides valuable insight to solar power variations over the course of one year. The dataset is evaluated with the transparent open box (TOB) learning network for data mining and prediction purposes. This algorithm provides accurate and repeatable MW predictions and enables detailed evaluation of the key influencing variables on each hourly data record. TOB Stage 1 applies a data matching routine driven by the squared errors between the independent variables. TOB Stage 2 applies a customized memetic firefly optimizer to minimize the root mean squared error (RMSE) for MW predictions over subsets and/or the full dataset. TOB achieves high prediction accuracy (average of five cases RMSE = 1044.4 MW; R2 = 0.975) using tuning subsets of only ∼303 data records (∼6% of the full dataset). The dataset displays some significant MW prediction outliers that are readily identified and explained individually by the TOB algorithm’s data mining capabilities. A slightly filtered dataset (4918 data records excluding 94 outlier data records) improves MW prediction accuracy (average of five cases RMSE = 936.1 MW; R2 = 0.980). Whereas the prediction outliers are readily segregated as a separate subset for more detailed evaluation. The TOB algorithm’s combined machine-learning and data mining capabilities provide valuable insight to the dataset and the influences of its independent variables. The algorithm couples high-prediction accuracy and detailed evaluation of long-term and short-term time series data and for spatial scales varying from country level to individual solar farms.