Main content area

Mining mass spectrometry data: Using new computational tools to find novel organic compounds in complex environmental mixtures

Longnecker, Krista, Kujawinski, Elizabeth B.
Organic geochemistry 2017 v.110 pp. 92-99
Bacillariophyceae, Cyanobacteria, Thalassiosira, carbon cycle, chemical elements, computer software, data collection, fatty acids, laboratory experimentation, marine environment, mass spectrometry, metabolites, metabolomics, mining, models, organic matter, Atlantic Ocean
Untargeted metabolomics datasets provide ample opportunity for discovery of novel metabolites. The major challenge is focusing data analysis on a short list of metabolites. Here, we apply a combination of computational tools that serve to reduce complex mass spectrometry data in order allow us to focus on new environmentally-relevant metabolites. In the first portion of the project, we explored mass spectrometry data from intracellular metabolites extracted from a model marine diatom, Thalassiosira pseudonana. The fragmentation data from these samples were analyzed using molecular networking, an on-line tool that clusters metabolites based on shared structural similarities. The features within each metabolite cluster were then putatively annotated using MetFrag, an in silico fragmentation tool. Using this combination of computational tools, we observed multiple lyso-sulfolipids, organic compounds not previously known to exist within cultured marine diatoms. In the second stage of the project, we searched our environmental data for these lyso-sulfolipids. The lyso-sulfolipid with a C14:0 fatty acid was found in dissolved and particulate samples from the western Atlantic Ocean, and a culture of cyanobacteria grown in our laboratory. Thus, the putative lyso-sulfolipids are present in both laboratory experiments and environmental samples. This project highlights the value of combining computational tools to detect and putatively identify organic compounds not previously recognized as important within T. pseudonana or the marine environment. Future applications of these tools to emerging metabolomics data will further open the black box of natural organic matter, identifying molecules that can be used to understand and monitor the global carbon cycle.