Main content area

An NGS-Independent Strategy for Proteome-Wide Identification of Single Amino Acid Polymorphisms by Mass Spectrometry

Xiong, Yun, Guo, Yufeng, Xiao, Weidi, Cao, Qichen, Li, Shanshan, Qi, Xianni, Zhang, Zhidan, Wang, Qinhong, Shui, Wenqing
Analytical chemistry 2016 v.88 no.5 pp. 2784-2791
amino acids, data collection, databases, genome, heat tolerance, high-throughput nucleotide sequencing, mass spectrometry, proteins, proteome, proteomics, sequence diversity, single nucleotide polymorphism, synthetic peptides, yeasts
Detection of proteins containing single amino acid polymorphisms (SAPs) encoded by nonsynonymous SNPs (nsSNPs) can aid researchers in studying the functional significance of protein variants. Most proteogenomic approaches for large-scale SAPs mapping require construction of a sample-specific database containing protein variants predicted from the next-generation sequencing (NGS) data. Searching shotgun proteomic data sets against these NGS-derived databases allowed for identification of SAP peptides, thus validating the proteome-level sequence variation. Contrary to the conventional approaches, our study presents a novel strategy for proteome-wide SAP detection without relying on sample-specific NGS data. By searching a deep-coverage proteomic data set from an industrial thermotolerant yeast strain using our strategy, we identified 337 putative SAPs compared to the reference genome. Among the SAP peptides identified with stringent criteria, 85.2% of SAP sites were validated using whole-genome sequencing data obtained for this organism, which indicates high accuracy of SAP identification with our strategy. More interestingly, for certain SAP peptides that cannot be predicted by genomic sequencing, we used synthetic peptide standards to verify expression of peptide variants in the proteome. Our study has provided a unique tool for proteogenomics to enable proteome-wide direct SAP identification and capture nongenetic protein variants not linked to nsSNPs.