Main content area

Information loss in approximately Bayesian estimation techniques: A comparison of generative and discriminative approaches to estimating agricultural productivity

Nearing, Grey S., Gupta, Hoshin V., Crow, Wade T.
Journal of hydrology 2013 v.507 pp. 163-173
biomass, environmental models, remote sensing
Data assimilation and regression are two commonly used methods for combining models and remote sensing observations to estimate agricultural productivity. Data assimilation is a generative approach because it requires explicit approximations of a Bayesian prior and likelihood to compute a probability density function of biomass conditional on observations, and regression is discriminative because it models the conditional biomass density function directly. Both of these methods typically approximate Bayes’ law and therefore cannot be expected to be perfectly efficient at extracting information from remote sensing observations. In this paper we measure information in observations using Shannon’s theory and define missing information, used information, and bad information as partial divergences from the true Bayesian posterior (biomass conditional on observations). These concepts were applied to directly measure the amount and quality of information about end-of-season biomass extracted from observations by the ensemble Kalman filter (EnKF) and Gaussian process regression (GPR). Results suggest that the simpler discriminative approach can be as efficient as the more complex generative approach in terms of extracting high quality information from observations, and may therefore be better suited to dealing with the practical problems associated with remote sensed data (e.g., sub-footprint scale heterogeneity). Our method for analyzing information use has many potential applications: approximations of Bayes’ law are used regularly in predictive models of environmental systems of all kinds, and the efficiency of such approximations has heretofore not been directly measured.