Main content area

DAMpred: Recognizing Disease-Associated nsSNPs through Bayes-Guided Neural-Network Model Built on Low-Resolution Structure Prediction of Proteins and Protein–Protein Interactions

Quan, Lijun, Wu, Hongjie, Lyu, Qiang, Zhang, Yang
Journal of molecular biology 2019 v.431 no.13 pp. 2449-2459
algorithms, control methods, data collection, genes, genetic disorders, human diseases, human health, humans, neural networks, open reading frames, prediction, protein structure, protein-protein interactions, proteins, single nucleotide polymorphism
Nearly one-third of non-synonymous single-nucleotide polymorphism (nsSNPs) are deleterious to human health, but recognition of the disease-associated mutations remains a significant unsolved problem. We proposed a new algorithm, DAMpred, to identify disease-causing nsSNPs through the coupling of evolutionary profiles with structure predictions of proteins and protein–protein interactions. The pipeline was trained by a novel Bayes-guided artificial neural network algorithm that incorporates posterior probabilities of distinct feature classifiers with the network training process. DAMpred was tested on a large-scale data set involving 10,635 nsSNPs from 2154 ORFs in the human genome and recognized disease-associated nsSNPs with an accuracy 0.80 and a Matthews correlation coefficient of 0.601, which is 9.1% higher than the best of other state-of-the-art methods. In the blind test on the TP53 gene, DAMpred correctly recognized the mutations causative of Li–Fraumeni-like syndrome with a Matthews correlation coefficient that is 27% higher than the control methods. The study demonstrates an efficient avenue to quantitatively model the association of nsSNPs with human diseases from low-resolution protein structure prediction, which should find important usefulness in diagnosis and treatment of genetic diseases.