Main content area

Robust regression estimation and inference in the presence of cellwise and casewise contamination

Leung, Andy, Zhang, Hongyang, Zamar, Ruben
Computational statistics & data analysis 2016 v.99 pp. 1-11
algorithms, data collection, models, regression analysis
Cellwise outliers are likely to occur together with casewise outliers in modern datasets of relatively large dimension. Recent work has shown that traditional robust regression methods may fail when applied to such datasets. We propose a new robust regression procedure to deal with casewise and cellwise outliers. The proposed method, called three-step regression, proceeds as follows: first, it uses a consistent univariate filter, that is, a procedure that flags and eliminates extreme cellwise outliers; second, it applies a robust estimator of multivariate location and scatter to the filtered data to down-weight casewise outliers; third, it computes robust regression coefficients from the estimates obtained in the second step. The three-step estimator is consistent and asymptotically normal at the central model under some assumptions on the tails of the distributions of the continuous covariates. The estimator is extended to handle both continuous and dummy covariates using an iterative algorithm. Extensive simulation results show that the three-step estimator is resilient to cellwise outliers. It also performs well under casewise contamination when compared to traditional high breakdown point estimators.