Main content area

Needles in a haystack: mapping rare and infrequent crops using satellite imagery and data balancing methods

Waldner, François, Chen, Yang, Lawes, Roger, Hochman, Zvi
Remote sensing of environment 2019
agricultural land, algorithms, cropping sequence, cropping systems, crops, data collection, food security, prices, remote sensing, time series analysis, Victoria (Australia)
Most cropping systems around the world are organised around few dominant crops and a larger number of less frequent crops. While rare and infrequent crops occupy a small share of the cropped area, they produce ecological benefits on farmland, contribute to sustainability and help provide food and nutritional security. However, data about their location and extent derived from satellite imagery generally lack accuracy, largely owing to the class imbalance problem. Class imbalance occurs when only few instances of some classes are available for training classifiers, and leads to large classification errors of the infrequent classes. In this study, we assessed the magnitude of the class imbalance problem in crop classification and evaluated balancing methods to combat it by either creating synthetic minority instances or by removing majority instances. To that aim, we generated 18 unbalanced data sets from Sentinel-2 time series and crop type observations in Victoria, Australia. These data sets covered a wide range of complexity, number of classes, number of samples per class and spectral separability which enabled us to gather evidence about the benefits and drawbacks of balancing methods in various settings. Classification accuracy was assessed with two metrics: the Overall Accuracy (OA), which gives more weight to majority classes, and the G-Mean accuracy (GM), which is more sensitive to minority classes. Results showed that class imbalance explained near 40% of the accuracy variability. We found that balancing methods boosted GM by 0.01-0.54 but no single best solution emerged. The price for increasing the accuracy of minority classes was a drop in OA of a magnitude that was problem- and method-specific. We thus applied an algorithm selection method to identify optimal balancing mechanisms in a computationally economic fashion. Optimal balancing methods lead to maximum gain in GM and minimum loss in OA. We demonstrated that this approach either successfully identified optimal balancing methods or ones that were not significantly sub-optimal, while reducing the computational cost by up to 60%. It can readily be incorporated to operational crop classification systems with little disruption to the existing processing chains. This contribution paves the way for achieving a more comprehensive and detailed view of crop distribution and cropping sequences.