Main content area

Comparison of haplotype-based statistical tests for disease association with rare and common variants

Datta, Ananda S., Biswas, Swati
Briefings in bioinformatics 2016 v.17 no.4 pp. 657-671
Bayesian theory, data collection, genetic analysis, haplotypes, hypertension, linear models, seeds, statistical analysis
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods—Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types—individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method—Sequence Kernel Association Test (SKAT) and its two variants—SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.