Modules

Overview

TurboPutative is a bioinformatic tool that streamlines the putative annotation process in Metabolomics, allowing data matrix handling and simplification, facilitating data visualization, useful information mining and, eventually, the prioritization of which candidate metabolites to investigate further. TurboPutative should receive as input a table with a format as shown in the example file. Additionally, in case of using TableMerger, a second table must be uploaded.

TurboPutative is designed to handle data obtained by untargeted metabolomics based on liquid chromatography coupled to mass spectrometry (with high-resolution analyzers) using electrospray ionization (LC/ESI-MS). In this type of studies, the list of candidate metabolites are generally obtained using tools (e.g. Ceu Mass Mediator 3.0 or MetaboSearch) that compare the experimental monoisotopic mass of a features with the theoretical mass of compounds listed in one or several databases (e.g. HMDB, Metlin, Kegg, LipidMaps, among others).


Features detected
Experimental mass
336.8786
98.0577
189.0468
Candidate metabolites
Experimental mass Adduct mz Error (ppm) Name
336.8786 M+H-H2O 7 Bithionol
336.8786 M+H-H2O 7 Tetradifon
98.0577 M+Na 1 Trimethylamine-N-oxide
98.0577 M+Na 1 1-Amino-propan-2-ol
98.0577 M+Na 1 3-aminopropan-1-ol
189.0468 M+H 1 2-Chlorobiphenyl
189.0468 M+H 1 4-Chlorobiphenyl
189.0468 M+Na 7 Ethionamide

Despite the use of high-resolution analyzers (OrbiTrap or TOF), a large number of candidate metabolites can be assigned to the same feature. In fact, several metabolites present the same molecular formula, but different structure, resulting in completely different compounds, with different chemical and physical properties. On top of that, different metabolites results in very similar m/z values after ionization with different adducts. The result is a long and complicated matrix with several candidates per features.

For this reasons we have developed TurboPutative, to help researchers determine which metabolites are most likely, avoiding the arduous, laborious and time-consuming process of manually reviewing each entry (comparing their associated errors and adducts, frequently searching databases to identify possible dietary, drug, or microorganism compound…etc.).

TurboPutative allow “Metabolomists” to proceed quickly with the biological interpretation of the results, or to prioritize both, the purchase of authentic chemical standards, and the analyses in MS/MS mode for accurate metabolites identification (view sample results).


Candidate metabolites (Input)
Experimental mass Adduct mz Error (ppm) Name
336.8786 M+H-H2O 7 Bithionol
336.8786 M+H-H2O 7 Tetradifon
98.0577 M+Na 1 Trimethylamine-N-oxide
98.0577 M+Na 1 1-Amino-propan-2-ol
98.0577 M+Na 1 3-aminopropan-1-ol
189.0468 M+H 1 2-Chlorobiphenyl
189.0468 M+H 1 4-Chlorobiphenyl
189.0468 M+Na 7 Ethionamide
Tagger
Experimental mass Adduct mz Error (ppm) Name Halogenated Microbial Drug
336.8786 M+H-H2O 7 Bithionol x
336.8786 M+H-H2O 7 Tetradifon x
98.0577 M+Na 1 Trimethylamine-N-oxide MC
98.0577 M+Na 1 1-Amino-propan-2-ol
98.0577 M+Na 1 3-aminopropan-1-ol
189.0468 M+H 1 2-Chlorobiphenyl x
189.0468 M+H 1 4-Chlorobiphenyl x
189.0468 M+Na 7 Ethionamide Drug
REname
Experimental mass Adduct mz Error (ppm) Name Halogenated Microbial Drug
336.8786 M+H-H2O 7 Bithionol x
336.8786 M+H-H2O 7 Tetradifon x
98.0577 M+Na 1 Trimethylamine-oxide MC
98.0577 M+Na 1 Amino-propan-ol
189.0468 M+H 1 Chlorobiphenyl x
189.0468 M+Na 7 Ethionamide Drug
RowMerger
Experimental mass Adduct mz Error (ppm) Name Halogenated Microbial Drug
336.8786 M+H-H2O 7 Bithionol // Tetradifon x
98.0577 M+Na 1 Trimethylamine-oxide MC
98.0577 M+Na 1 Aminopropan-ol
189.0468 M+H 1 Chlorobiphenyl x
189.0468 M+Na 7 Ethionamide Drug

Tagger

Tagger is a classifier capable of detecting metabolites characterized as nutrients, drugs, microbial, natural products, plants, halogenated or peptides. The classification is performed using regular expressions and predefined lists extracted from different databases. The following is a description of how each of the classification is made:

  • Nutrients: Nutrient classification is performed using a predefined list of compounds extracted from the HMDB. Specifically, the list contains compounds that are characterized in HMDB as "Food" or as "Food and Nutrition", excluding those of endogenous origin ("Endogenous"). The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
  • Drugs: Drug classification is performed using a predefined list of compounds extracted from the HMDB and DrugBank. Specifically, the list contains all DrugBank compounds and those metabolites that are characterized in HMDB as "Drug" or as "Pharmaceutical industry". In both cases, it was excluded the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
  • Plants: Plant metabolites classification is performed using a predefined list of compounds extracted from the PlantCyc database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
  • Natural Products: Natural products classification is performed using a predefined list of compounds extracted from the LOTUS database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
  • Microbiote-dependent: Microbiote-dependent metabolites classification is performed using an in-house list of compounds and metabolites contained in Metabolomics Data Explorer (Sonnenburg Laboratory), enriched with their synonyms contained in PubChem database.
  • Halogens: Halogenated compounds are classified using regular expressions. If the input table contains a column with the “Molecular Formula” of the compounds, it is applied a regular expression that matches halogenated elements. Otherwise, it is applied a regular expression that can identify the presence of halogens in the name of the compound.
  • Peptides: Peptides are classified using regular expressions that can identify compounds that are made up solely of amino acids.
Halogens
Regular Expression ([Ff]luor(?!ene)|[Cc]hlor(?!ophyl)|[Bb]rom|[Ii]od)
Compound 1 14,14,14-Trifluoro-11E-tetradecenyl acetate
Compound 2 6-(2-Chloroallylthio)purine
Compound 3 Bromhexine
Compound 4 Fluorene
Compound 5 Chlorophyl
Peptides
Regular Expression (?i)^(Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu| Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|[-,\s]){3,}$
Compound 1 Pro Gly
Compound 2 Glu Ala
Compound 3 Gly Leu Cys
Compound 4 Met-Ala-Ala
Compound 5 Ile,Gly,Cys

After execution, the program generates a table similar to the input table with an additional column for each classification performed:

Experimental mass Name
102.0681 Isovalerate
162.1131 Ethyl levulinate
195.0876 L-Pinitol
274.123 Gly Leu Cys
401.2131 6alpha-Fluoroprogesterone
Experimental mass Name Peptide Halogenated Drug Food
102.0681 Isovalerate
162.1131 Ethyl levulinate Food
195.0876 L-Pinitol Drug Food
274.123 Gly Leu Cys Peptide
401.2131 6alpha-Fluoroprogesterone x

REname

Many of the annotated candidate metabolites are isomers or identical compounds with different nomenclature. In untargeted metabolomics based on LC/MS or CE/MS it is not possible to identify the position of double bonds and functional groups without performing further analysis in MS/MS mode. Therefore, the table with candidate metabolites contains numerous generic annotations, with many repetitions and limited utility for the researcher.

REname facilitates the extraction and visualization of useful information by identifying isomers and equivalent compounds and merging them under a single annotation.

Experimental mass Name
131.0582 N-acetyl-L-Alanine
131.0582 N-Acetyl-beta-alanine
217.1045 (2R,3R)-heptane-1,2,3-triol
217.1045 heptane-1,2,3-triol
311.1262 6E,8E,14E-Hexadecatriene-10,12-diynoic acid
311.1262 6E,8E,14Z-Hexadecatriene-10,12-diynoic acid
Experimental mass Name
131.0582 acetyl-Alanine
217.1045 heptane-triol
311.1262 Hexadecatriene-diynoate

For this purpose, REname uses a dictionary with more than 15,000 compounds associated with its simplified or generic name. In the event that a compound is not contained in the dictionary, REname will process it by applying a set of regular expressions that, sequentially, will simplify the compound name.

Input name Hexadecatriene-10,12-diynoic acid
Section 1 Regex ([-\s])(\d+[,\s]{,2})+- Hexadecatriene-10,12-diynoic acid
Replace \g<1> Hexadecatriene-diynoic acid
Section 2 Regex (?i)ic acid Hexadecatriene-diynoic acid
Replace ate Hexadecatriene-diynoate
Output name Hexadecatriene-diynoate

However, peptides and lipids that have fatty acids in their structure (e.g. phospholipids, sphingolipids and glycerolipids) are not processed by means of regular expressions.

In the first case, REname combines under a single annotation those peptides that have the same amino acid composition. For this purpose, amino acids must be expressed in the three-letter nomenclature.

Experimental mass Name
388.1555 Pro Pro Met
388.1555 Pro Met Pro
388.1555 Met Pro Pro
433.1408 Trp Asp Asp
433.1408 Asp Asp Trp
433.1408 Asp Trp Asp
Experimental mass Name
388.1555 Met Pro Pro
433.1408 Trp Asp Asp

In the case of fatty acid lipids, compounds are processed using the Goslin package. Goslin can process the name of lipids coming from different databases (e.g. LipidMaps, SwissLipids, HMDB) and classify them thanks to a system of parsers and predefined grammars. The information extracted from the name of the lipid is stored in an object with a fixed structure, which facilitates the access to the information of interest. REname will extract the following information from lipid compounds:

  • Header group (e.g. PE or phosphoethanolamine).
  • Total number of carbon atoms in fatty acids.
  • Total number of double bonds in fatty acids.
  • Type of bond (ether or vinyl ether) of fatty acids.
  • Number of hydroxyl and methyl groups in fatty acids.
Experimental mass Name
793.5565 PE-Cer(d16:2(4E,6E)/24:0(2OH))
800.5444 PE(18:0(10(R)Me)/16:0)
812.5478 PC(15:0/18:1(9Z))
812.5478 PC(15:1(9Z)/18:0)
869.5545 PI(18:3(6Z,9Z,12Z)/20:0)
869.5545 PI(20:2(11Z,14Z)/18:1(9Z))
Experimental mass Name
793.5565 PE-Cer(40:2(2OH))
800.5444 PE(34:0(Me))
812.5478 PC(33:1)
869.5545 PI(38:3)

RowMerger

RowMerger is an entity comparer that allows combining information from different annotations following user-defined criteria, i.e. grouping annotations assigned to the same feature in a single entry. This step facilitate consistently data visualization and therefore the interpretation of the results.

To execute RowMerger, the user can specify which annotation properties (i.e. table columns) will be considered during the merger and which ones will be kept in the resulting table.

By default, RowMerger will combine annotations that have the same Experimental mass, Adduct and mz Error (ppm), retaining the Identifier and the Name of merged annotations.

Experimental mass Identifier Adduct Name
83.0607 154396 M+H Methylimidazole
83.0607 61184 M+H Fomepizole
171.0628 150549 M+Na Paratose
171.0628 158202 M+Na Abequose
171.0628 151197 M+Na Tyvelose
Experimental mass Identifier Adduct Name
83.0607 154396 // 61184 M+H Methylimidazole // Fomepizole
171.0628 150549 // 158202 // 151197 M+Na Paratose // Abequose // Tyvelose

TPMetrics

TPMetrics applies a multi criteria scoring algorithm that incorporates analytical correlations and ionization probabilities to identify the most probable annotation. Furthermore, it also allows combining into a unique data matrix two different datasets, specifically, the one obtained after data curation by TurboPutative with another dataset that encompasses data deemed useful for the interpretation of the results.

Therefore, TPMetrics receives as input two tables specified by the user. The equivalent features of both tables will be identified thanks to their mass and their retention time (if the retention time is not present, only the mass will be considered). Besides, one of the tables must contain intensity values to calculate the score.

Putative Annotations
Additional information
Experimental mass Name
83.0607 Methylimidazole
104.107 Neurine
83.0607 Valinol
197.1009 Arginine
+
Feature Experimental mass RT [min] I1 I2 I3
A08295 83.0607 10.82 1.12 1.19 1.15
A01122 104.10704 0.338 0.91 0.97 0.95
A01178 197.10092 0.339 0.98 0.91 0.93
Feature Experimental mass RT [min] Name TPMetrics I1 I2 I3
A08295 83.0607 10.82 Methylimidazole 25.37 1.12 1.19 1.15
A01122 104.107 0.338 Neurine 12.02 0.91 0.97 0.95
A08295 83.0607 10.82 Valinol 4.12 1.12 1.19 1.15
A01178 197.1009 0.339 Arginine 8.47 0.98 0.91 0.93