PubAg

Main content area

Optimal choice of k-mer in composition vector method for genome sequence comparison

Author:
Das, Subhram, Deb, Tamal, Dey, Nilanjan, Ashour, Amira S., Bhattacharya, D.K., Tibarewala, D.N.
Source:
Genomics 2018 v.110 no.5 pp. 263-273
ISSN:
0888-7543
Subject:
genes, proteins, sequence analysis
Abstract:
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.
Agid:
6111524