Classification of pathogenic microbes using a minimal set of single nucleotide polymorphisms derived from whole genome sequences

Roychowdhury, Tanmoy, Singh, Vinod Kumar, Bhattacharya, Alok
Genomics 2019 v.111 no.2 pp. 205-211
Escherichia coli, Mycobacterium tuberculosis, genetic markers, genetic variation, infectious diseases, microorganisms, pathogens, phenotypic variation, single nucleotide polymorphism
In a context specific manner, Intra-species genomic variation plays an important role in phenotypic diversity observed among pathogenic microbes. Efficient classification of these pathogens is important for diagnosis and treatment of several infectious diseases. NGS technologies have provided access to wealth of data that can be utilized to discover important markers for pathogen classification. In this paper, we described three different approaches (Jensen-Shannon divergence, random forest and Shewhart control chart) for identification of a minimal set of SNPs that can be used for classification of organisms. These methods are generic and can be implemented for analysis of any organism. We have shown usefulness of these approaches for analysis of Mycobacterium tuberculosis and Escherichia coli isolates. We were able to identify a minimal set of 18 SNPs that can be used as molecular markers for phylogroup based classification and 8 SNPs for pathogroup based classification of E. coli.