Jump to Main Content
Properties of Endogenous Post-Stratified Estimation using remote sensing data
- Tipton, John, Opsomer, Jean, Moisen, Gretchen
- Remote sensing of environment 2013 v.139 pp. 130-137
- Landsat, USDA Forest Service, artificial intelligence, canopy, forest inventory, geographic information systems, linear models, prediction, remote sensing, surveys, trees, variance
- Post-stratification is commonly used to improve the precision of survey estimates. In traditional post-stratification methods, the stratification variable must be known at the population level. When suitable covariates are available at the population level, an alternative approach consists of fitting a model on the covariates, making predictions for the population and then stratifying on these predicted values. This method is called Endogenous Post-Stratification Estimation (EPSE) and it is well suited for applications using remote sensing data. In this article, we investigate the performance of EPSE in a realistic setting using data from the United States Forest Service Forest Inventory Analysis and Landsat Enhanced Thematic Mapper Plus. This article has three specific objectives: first, to evaluate the statistical properties of EPSE when using linear regression, spline regression, and the machine learning tool Random Forest to predict tree canopy cover using remote sensing and Geographic Information System data; second, to investigate the effect on the EPSE variance estimator using estimated stratum boundaries instead of fixed stratum boundaries; and third, to investigate the effect on the EPSE variance estimator when optimizing the stratum boundaries to minimize the variance estimate. The main findings of this article are that the EPSE variance estimator performs well using Random Forests, but can underestimate the true variance if an optimization is performed on the stratum boundaries in an attempt to minimize the variance estimate. This result supports the use of the EPSE estimator using remote sensing data in cases where there is no optimization on the variance estimator.