Main content area

Optimal choice of k-mer in composition vector method for genome sequence comparison

Das, Subhram, Deb, Tamal, Dey, Nilanjan, Ashour, Amira S., Bhattacharya, D.K., Tibarewala, D.N.
Genomics 2018 v.110 no.5 pp. 263-273
genes, proteins, sequence analysis
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.