Main content area

Meta-Analytic Framework for Sparse K-Means to Identify Disease Subtypes in Multiple Transcriptomic Studies

Huo, Zhiguang, Ding, Ying, Liu, Silvia, Oesterreich, Steffi, Tseng, George
Journal of the American Statistical Association 2016 v.111 no.513 pp. 27-42
artificial intelligence, breast neoplasms, data collection, equations, genes, leukemia, meta-analysis, models, phenotype, statistics, transcriptomics
Disease phenotyping by omics data has become a popular approach that potentially can lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step toward this goal. In this article, we extend a sparse K -means method toward a meta-analytic framework to identify novel disease subtypes when expression profiles of multiple cohorts are available. The lasso regularization and meta-analysis identify a unique set of gene features for subtype characterization. An additional pattern matching reward function guarantees consistent subtype signatures across studies. The method was evaluated by simulations and leukemia and breast cancer datasets. The identified disease subtypes from meta-analysis were characterized with improved accuracy and stability compared to single study analysis. The breast cancer model was applied to an independent METABRIC dataset and generated improved survival difference between subtypes. These results provide a basis for diagnosis and development of targeted treatments for disease subgroups. Supplementary materials for this article are available online.