Missing data, imputation, and endogeneity

McDonough, Ian K., Millimet, Daniel L.
Journal of econometrics 2017 v.199 no.2 pp. 141-155
birth weight, econometric models, economic analysis, economic theory
Bassmann (1957, 1959) introduced two-stage least squares (2SLS). In subsequent work, Basmann et al. (1971) investigated its finite sample performance. Here we build on this tradition focusing on the issue of 2SLS estimation of a structural model when data on the endogenous covariate is missing for some observations. Many such imputation techniques have been proposed in the literature. However, there is little guidance available for choosing among existing techniques, particularly when the covariate being imputed is endogenous. Moreover, because the finite sample bias of 2SLS is not monotonically decreasing in the degree of measurement accuracy, the most accurate imputation method is not necessarily the method that minimizes the bias of 2SLS. Instead, we explore imputation methods designed to increase the first-stage strength of the instrument(s), even if such methods entail lower imputation accuracy. We do so via simulations as well as with an application related to the medium-run effects of birth weight.