Main content area

Protein fingerprinting with digital sequences of linear protein subsequence volumes: a computational study

Sampath, G
Journal of biosciences 2019 v.44 no.2 pp. 54
Helicobacter pylori, nanopores, peptide mapping, polymers, proteins, proteolysis, proteome
Proteins in a proteome can be identified from a sequence of K integers equal to the digitized volumes of subsequences with L residues from the primary sequence of a stretched protein. Exhaustive computations on the proteins of Helicobacter pylori (UniProt id UP000000210) with L and K in the range 4–8 show that ~90% of the proteins can be identified uniquely in this manner. This computational result can be translated into practice with a nanopore, an emerging technology that does not require analyte immobilization, proteolysis or labeling. Unlike other methods, most of which focus on a specific target protein, nanopore-based methods enable the identification of multiple proteins from a sample in a single run. Recent work by Kennedy, Kolmogorov and associates shows that the blockade current due to a protein molecule translocating through a nanopore is roughly proportional to one or more contiguous residues. The present study points to a modified version in which the volumes of subsequences (rather than of single residues) may be obtained by integrating the blockade current due to L contiguous residues. The advantages arising from this include lower detector bandwidth, elimination of the homopolymer problem and reduced noise. Because an identifier is based on near as well as distant (up to 2KL-L) residues, this approach uses more global information than an approach based on single residues and short-range correlations. The results of the study, which are available in a data supplement, are discussed in detail. Potential implementation issues are addressed.