Main content area

Assessment of the performance of hidden Markov models for imputation in animal breeding

Whalen, Andrew, Gorjanc, Gregor, Ros-Freixedes, Roger, Hickey, John M.
Genetics, selection, evolution 2018 v.50 no.1 pp. 44
Beagle, Markov chain, algorithms, animal breeding, animals, data collection, diploidy, haploidy, human genetics, pedigree
BACKGROUND: In this paper, we review the performance of various hidden Markov model-based imputation methods in animal breeding populations. Traditionally, pedigree and heuristic-based imputation methods have been used for imputation in large animal populations due to their computational efficiency, scalability, and accuracy. Recent advances in the area of human genetics have increased the ability of probabilistic hidden Markov model methods to perform accurate phasing and imputation in large populations. These advances may enable these methods to be useful for routine use in large animal populations, particularly in populations where pedigree information is not readily available. METHODS: To test the performance of hidden Markov model-based imputation, we evaluated the accuracy and computational cost of several methods in a series of simulated populations and a real animal population without using a pedigree. First, we tested single-step (diploid) imputation, which performs both phasing and imputation. Second, we tested pre-phasing followed by haploid imputation. Overall, we used four available diploid imputation methods (fastPHASE, Beagle v4.0, IMPUTE2, and MaCH), three phasing methods, (SHAPEIT2, HAPI-UR, and Eagle2), and three haploid imputation methods (IMPUTE2, Beagle v4.1, and Minimac3). RESULTS: We found that performing pre-phasing and haploid imputation was faster and more accurate than diploid imputation. In particular, among all the methods tested, pre-phasing with Eagle2 or HAPI-UR and imputing with Minimac3 or IMPUTE2 gave the highest accuracies with both simulated and real data. CONCLUSIONS: The results of this study suggest that hidden Markov model-based imputation algorithms are an accurate and computationally feasible approach for performing imputation without a pedigree when pre-phasing and haploid imputation are used. Of the algorithms tested, the combination of Eagle2 and Minimac3 gave the highest accuracy across the simulated and real datasets.