Main content area

Simple but efficient signal pre-processing in soil organic carbon spectroscopic estimation

Vašát, Radim, Kodešová, Radka, Klement, Aleš, Borůvka, Luboš
Geoderma 2017 v.298 pp. 46-53
Cambisols, Chernozems, Leptosols, Luvisols, adverse effects, data collection, least squares, prediction, reflectance, reflectance spectroscopy, soil organic carbon, wavelengths, wavelet
As there is no single (or combination of) signal pre-processing method that works best with all data sets, choosing the most feasible one is a key aspect in soil diffuse reflectance spectroscopy in the visible– and near infrared region (400–2500nm). The commonly used pre-processing methods include tools for spectra smoothing and/or noise reduction (e.g. Savitzky-Golay (SG) filtering or discrete wavelet transformation (DWT)), light scatter correction (multiplicative scatter correction (MSC), standard normal variate (SNV)), baseline normalization techniques to cope with vertical offset and/or slope effects (e.g. continuum removal (CR), first and second order derivative (FD and SD)), as well as other transformations (e.g. logarithmic-log(1/R)). All of these tools are aimed at eliminating or reducing unwanted side effects (artifacts) in the spectra and at enhancing the recognition of relevant information. For soil organic carbon content estimation using partial least square regression calibration technique, smoothing with SG filter and (or in combination with) CR usually ensures a reliable estimation. However, the common CR may suffer from a few shortcomings. An approximation is applied to connect the pivot points of the spectrum in order to derive a continuum, but more problematically, the CR procedure does not recognize the true essence of the vertical shift at the very beginning of the spectra (the CR value always equals one at that point). Therefore, we decided to modify the procedure in the way that the reflectance values at respective wavelengths were divided not by the continuum, but by the maximal reflectance value of the particular spectrum. This correction by the maximum reflectance (CMR) pre-processing was tested in comparison with eight other above mentioned methods at four different study sites that differ in the prevailing soil units. As a result, on site 1 (Haplic Chernozem), we achieved a significantly improved prediction accuracy using the CMR (R2cv=0.845) compared to raw (but smoothed) soil spectra (0.815). On site 2 (Rendzic Leptosol), the most accurate prediction was achieved equally with CMR, MSC, SNV, log(1/R), DWT and raw spectra (R2cv from 0.560 to 0.592), and on site 3 (Haplic Cambisol) equally with MSC and CMR (both R2cv=0.767), as only these two were significantly different from the raw spectra. On site 4 (Haplic Luvisol), the only one significantly more accurate prediction compared to raw spectra was achieved with FD (R2cv=0.611), while for the rest of the methods, except SD, there was no difference if either raw spectra or other transformations were used (R2cv from 0.499 to 0.591). Finally, using the whole data set the differences between pre-processing methods were even less pronounced, when there was no significant difference between raw spectra and other methods (except SD which was significantly worse), although all the predictions were more accurate in general (R2cv from 0.811 to 0.831).