Main content area

Functional association prediction by community profiling

Jiao, Dazhi, Han, Wontack, Ye, Yuzhen
Methods 2017 v.129 pp. 8-17
DNA, amino acid sequences, genes, metagenomics, microbial communities, nucleotide sequences, phylogeny, prediction, proteins
Recent years have witnessed unprecedented accumulation of DNA sequences and therefore protein sequences (predicted from DNA sequences), due to the advances of sequencing technology. One of the major sources of the hypothetical proteins is the metagenomics research. Current annotation of metagenomes (collections of short metagenomic sequences or assemblies) relies on similarity searches against known gene/protein families, based on which functional profiles of microbial communities can be built. This practice, however, leaves out the hypothetical proteins, which may outnumber the known proteins for many microbial communities. On the other hand, we may ask: what can we gain from the large number of metagenomes made available by the metagenomic studies, for the annotation of metagenomic sequences as well as functional annotation of hypothetical proteins in general? Here we propose a community profiling approach for predicting functional associations between proteins: two proteins are predicted to be associated if they share similar presence and absence profiles (called community profiles) across microbial communities. Community profiling is conceptually similar to the phylogenetic profiling approach to functional prediction, however with fundamental differences. We tested different profile construction methods, the selection of reference metagenomes, and correlation metrics, among others, to optimize the performance of this new approach. We demonstrated that the community profiling approach alone slightly outperforms the phylogenetic profiling approach for associating proteins in species that are well represented by sequenced genomes, and combining phylogenetic and community profiling further improves (though only marginally) the prediction of functional association. Further we showed that community profiling method significantly outperforms phylogenetic profiling, revealing more functional associations, when applied to a more recently sequenced bacterial genome.