TY - JOUR
DP - National Agricultural Library
DB - PubAg
JO - Computers and electronics in agriculture
TI - Optimizing wavelength selection by using informative vectors for parsimonious infrared spectra modelling
A1 - Ng, Wartini
A4 - Ng, Wartini
A4 - Minasny, Budiman
A4 - Malone, Brendan P.
A4 - Sarathjith, M.C.
A4 - Das, Bhabani S.
EP - 2019 v.158
KW - algorithms
KW - carbon
KW - cation exchange capacity
KW - clay
KW - covariance
KW - data collection
KW - infrared spectroscopy
KW - least squares
KW - models
KW - pH
KW - prediction
KW - sand fraction
KW - wavelengths
AN - 6309359
AB - Infrared spectroscopy has been widely adopted by various agricultural research. The typical spectra variables contain thousands of wavelengths. These large number of spectra variables often contribute to collinearity, and redundancies rather than relevant information. Variable selection of the predictors is an important step to create a robust calibration model from these spectra data. This paper presents an algorithm for spectra variable selection based on a combination of informative vectors and an ordered predictor selection (OPS) approach with an exponentially decreasing function (EDF) selection. Informative vectors are features derived from statistical principles that can be used to describe the relationship between the dependent variables and the predictors (spectra). The informative vectors analysed include regression coefficient vector (b), variable influence on projection (V), residual vector (S), net analyte signal vector (Na), linear correlation vector (COR), biweight mid-correlation vector (BIC), mutual information based on adjacency matrix (AMI), covariance procedures matrix (COV). These eight informative vectors can be joined in pairs and become 22 combination vectors. This approach was tested with near-infrared soil spectra for predicting the properties of pH, clay and sand content, cation exchange capacity (CEC), and total carbon content. This example used the Cubist regression tree and partial least squares regression (PLSR) models for calibration. By utilizing the subset of the spectra (retaining those that are significant based on the absolute values of the informative vectors), the regression models were still able to enhance the prediction capability. Overall, the PLSR model performed better than the Cubist model. The informative vector b (and its combinations) and S (and its combinations) were found to be the ones that provide the most accurate predictions for this dataset. Although the performance of the subset model does not perform better than the full spectra model, the number of wavelengths variable used in the model is significantly reduced to, on average, 25%.
PY - 2019
LA -
DA - 2019-03
VL - v. 158
SP - pp. 201-210
DO - 10.1016/j.compag.2019.02.003
ER -