Main content area

Deriving optimal data-analytic regimes from benchmarking studies

Doove, Lisa L., Wilderjans, Tom F., Calcagnì, Antonio, Van Mechelen, Iven
Computational statistics & data analysis 2017 v.107 pp. 81-91
algorithms, biostatistics, data collection, models, statistical analysis
In benchmarking studies with simulated data sets in which two or more statistical methods are compared, over and above the search of a universally winning method, one may investigate how the winning method may vary over patterns of characteristics of the data or the data-generating mechanism. Interestingly, this problem bears strong formal similarities to the problem of looking for optimal treatment regimes in biostatistics when two or more treatment alternatives are available for the same medical problem or disease. It is outlined how optimal data-analytic regimes, that is to say, rules for optimally calling in statistical methods, can be derived from benchmarking studies with simulated data by means of supervised classification methods (e.g., classification trees). The approach is illustrated by means of analyses of data from a benchmarking study to compare two different algorithms for the estimation of a two-mode additive clustering model.