Main content area

A Bayesian framework for joint structure and colour based pixel-wise classification of grapevine proximal images

Abdelghafour, F., Rosu, R., Keresztes, B., Germain, C., Da Costa, J.P.
Computers and electronics in agriculture 2019 v.158 pp. 345-357
Bayesian theory, Vitis, agronomic traits, canopy, color, image analysis, models, monitoring, normal distribution, phenology, precision agriculture, small fruits, texture, viticulture
Estimating the spatial variability of basic agronomic parameters at the scale of the plant is of prime importance for the development and monitoring of Precision Agriculture applications. It is all the more crucial in viticulture where intra-plot variabilities are exacerbated. This paper focuses on the description of the structure of the canopy at the plant scale by proximal imaging. A new framework is proposed for the pixel-wise classification of the grapevine canopy into organs at different phenological stages. The proposed processing chain proceeds in four steps: (i) foreground extraction, (ii) pixel-wise feature extraction, (iii) pixel-wise classification and (iv) spatial regularization. Step (i) is based on colour information only. For step (ii), colour is represented using an RGB triplet while texture is captured using the local structure tensor (LST). Two variants are proposed to associate colour and LST information into a single Euclidean vector. Step (iii) is a Bayesian decision process based on the joint modelling of colour and texture using multivariate Gaussian distributions. Finally, step (iv) combines stochastic relaxation and morphological filtering, allowing for the spatial regularisation of the classification output. This processing chain is applied to the pixel-wise classification of proximal images into grapevine organs. Images were taken from two 0.2 ha plots planted with the red variety “Merlot Noir” in Bordeaux area. Images were taken from an embedded acquisition system at three key phenological stages: flowerhood falling, pea-sized berriesand berries touching (BBCH 68, 75 and 79). Results are produced with leave-one-out cross validations where models are estimated from only 15 images per stage containing about 3.2×106 labellised pixels. The resulting classification performances are measured in terms of recall and precision that reached overall between 85% and 95% depending on the stage while overall accuracies range between 88% and 93%.