Main content area

DTMiner: identification of potential disease targets through biomedical literature mining

Xu, Dong, Zhang, Meizhuo, Xie, Yanping, Wang, Fan, Chen, Ming, Zhu, Kenny Q., Wei, Jia
Bioinformatics 2016 v.32 no.23 pp. 3619-3626
Internet, algorithms, bioinformatics, data collection, genes
Motivation: Biomedical researchers often search through massive catalogues of literature to look for potential relationships between genes and diseases. Given the rapid growth of biomedical literature, automatic relation extraction, a crucial technology in biomedical literature mining, has shown great potential to support research of gene-related diseases. Existing work in this field has produced datasets that are limited both in scale and accuracy. Results: In this study, we propose a reliable and efficient framework that takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. The framework incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene–disease pairs, and ranking algorithms that estimate how closely the pairs are related. The F1-score of the NER phase is 0.87, which is higher than existing studies. The association detection phase takes drastically less time than previous work while maintaining a comparable F1-score of 0.86. The end-to-end result achieves a 0.259 F1-score for the top 50 genes associated with a disease, which performs better than previous work. In addition, we released a web service for public use of the dataset. Availability and Implementation: The implementation of the proposed algorithms is publicly available at The web service is available at Contact: or Supplementary information: Supplementary data are available at Bioinformatics online.