Overview
TurboPutative is a bioinformatic tool that streamlines the putative annotation process in Metabolomics, allowing data matrix handling and simplification, facilitating data visualization, useful information mining and, eventually, the prioritization of which candidate metabolites to investigate further.
TurboPutative should receive as input a table with a format as shown in the example file. Additionally, in case of using TableMerger, a second table must be uploaded.
TurboPutative is designed to handle data obtained by untargeted metabolomics based on liquid chromatography coupled to mass spectrometry (with high-resolution analyzers) using electrospray ionization (LC/ESI-MS). In this type of studies, the list of candidate metabolites are generally obtained using tools (e.g. Ceu Mass Mediator 3.0 or MetaboSearch) that compare the experimental monoisotopic mass of a features with the theoretical mass of compounds listed in one or several databases (e.g. HMDB, Metlin, Kegg, LipidMaps, among others).
Features detected
Experimental mass |
336.8786 |
98.0577 |
189.0468 |
→
Candidate metabolites
Experimental mass |
Adduct |
mz Error (ppm) |
Name |
336.8786 |
M+H-H2O |
7 |
Bithionol |
336.8786 |
M+H-H2O |
7 |
Tetradifon |
98.0577 |
M+Na |
1 |
Trimethylamine-N-oxide |
98.0577 |
M+Na |
1 |
1-Amino-propan-2-ol |
98.0577 |
M+Na |
1 |
3-aminopropan-1-ol |
189.0468 |
M+H |
1 |
2-Chlorobiphenyl |
189.0468 |
M+H |
1 |
4-Chlorobiphenyl |
189.0468 |
M+Na |
7 |
Ethionamide |
Despite the use of high-resolution analyzers (OrbiTrap or TOF), a large number of candidate metabolites can be assigned to the same feature. In fact, several metabolites present the same molecular formula, but different structure, resulting in completely different compounds, with different chemical and physical properties. On top of that, different metabolites results in very similar m/z values after ionization with different adducts. The result is a long and complicated matrix with several candidates per features.
For this reasons we have developed TurboPutative, to help researchers determine which metabolites are most likely, avoiding the arduous, laborious and time-consuming process of manually reviewing each entry (comparing their associated errors and adducts, frequently searching databases to identify possible dietary, drug, or microorganism compound…etc.).
TurboPutative allow “Metabolomists” to proceed quickly with the biological interpretation of the results, or to prioritize both, the purchase of authentic chemical standards, and the analyses in MS/MS mode for accurate metabolites identification
(view sample results).
Candidate metabolites (Input)
Experimental mass |
Adduct |
mz Error (ppm) |
Name |
336.8786 |
M+H-H2O |
7 |
Bithionol |
336.8786 |
M+H-H2O |
7 |
Tetradifon |
98.0577 |
M+Na |
1 |
Trimethylamine-N-oxide |
98.0577 |
M+Na |
1 |
1-Amino-propan-2-ol |
98.0577 |
M+Na |
1 |
3-aminopropan-1-ol |
189.0468 |
M+H |
1 |
2-Chlorobiphenyl |
189.0468 |
M+H |
1 |
4-Chlorobiphenyl |
189.0468 |
M+Na |
7 |
Ethionamide |
↓
Tagger
Experimental mass |
Adduct |
mz Error (ppm) |
Name |
Halogenated |
Microbial |
Drug |
336.8786 |
M+H-H2O |
7 |
Bithionol |
x |
|
|
336.8786 |
M+H-H2O |
7 |
Tetradifon |
x |
|
|
98.0577 |
M+Na |
1 |
Trimethylamine-N-oxide |
|
MC |
|
98.0577 |
M+Na |
1 |
1-Amino-propan-2-ol |
|
|
|
98.0577 |
M+Na |
1 |
3-aminopropan-1-ol |
|
|
|
189.0468 |
M+H |
1 |
2-Chlorobiphenyl |
x |
|
|
189.0468 |
M+H |
1 |
4-Chlorobiphenyl |
x |
|
|
189.0468 |
M+Na |
7 |
Ethionamide |
|
|
Drug |
↓
REname
Experimental mass |
Adduct |
mz Error (ppm) |
Name |
Halogenated |
Microbial |
Drug |
336.8786 |
M+H-H2O |
7 |
Bithionol |
x |
|
|
336.8786 |
M+H-H2O |
7 |
Tetradifon |
x |
|
|
98.0577 |
M+Na |
1 |
Trimethylamine-oxide |
|
MC |
|
98.0577 |
M+Na |
1 |
Amino-propan-ol |
|
|
|
189.0468 |
M+H |
1 |
Chlorobiphenyl |
x |
|
|
189.0468 |
M+Na |
7 |
Ethionamide |
|
|
Drug |
↓
RowMerger
Experimental mass |
Adduct |
mz Error (ppm) |
Name |
Halogenated |
Microbial |
Drug |
336.8786 |
M+H-H2O |
7 |
Bithionol // Tetradifon |
x |
|
|
98.0577 |
M+Na |
1 |
Trimethylamine-oxide |
|
MC |
|
98.0577 |
M+Na |
1 |
Aminopropan-ol |
|
|
|
189.0468 |
M+H |
1 |
Chlorobiphenyl |
x |
|
|
189.0468 |
M+Na |
7 |
Ethionamide |
|
|
Drug |
Tagger
Tagger is a classifier capable of detecting metabolites characterized as nutrients, drugs, microbial, natural products, plants, halogenated or peptides. The classification is performed using regular expressions and predefined lists extracted from different databases. The following is a description of how each of the classification is made:
- Nutrients: Nutrient classification is performed using a predefined list of compounds extracted from the HMDB. Specifically, the list contains compounds that are characterized in HMDB as "Food" or as "Food and Nutrition", excluding those of endogenous origin ("Endogenous"). The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
- Drugs: Drug classification is performed using a predefined list of compounds extracted from the HMDB and DrugBank. Specifically, the list contains all DrugBank compounds and those metabolites that are characterized in HMDB as "Drug" or as "Pharmaceutical industry". In both cases, it was excluded the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
- Plants: Plant metabolites classification is performed using a predefined list of compounds extracted from the PlantCyc database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
- Natural Products: Natural products classification is performed using a predefined list of compounds extracted from the LOTUS database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
- Microbiote-dependent: Microbiote-dependent metabolites classification is performed using an in-house list of compounds and metabolites contained in Metabolomics Data Explorer (Sonnenburg Laboratory), enriched with their synonyms contained in PubChem database.
- Halogens: Halogenated compounds are classified using regular expressions. If the input table contains a column with the “Molecular Formula” of the compounds, it is applied a regular expression that matches halogenated elements. Otherwise, it is applied a regular expression that can identify the presence of halogens in the name of the compound.
- Peptides: Peptides are classified using regular expressions that can identify compounds that are made up solely of amino acids.
Halogens
Regular Expression |
([Ff]luor(?!ene)|[Cc]hlor(?!ophyl)|[Bb]rom|[Ii]od) |
Compound 1 |
14,14,14-Trifluoro-11E-tetradecenyl acetate |
Compound 2 |
6-(2-Chloroallylthio)purine |
Compound 3 |
Bromhexine |
Compound 4 |
Fluorene |
Compound 5 |
Chlorophyl |
Peptides
Regular Expression |
(?i)^(Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|
Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|[-,\s]){3,}$ |
Compound 1 |
Pro Gly |
Compound 2 |
Glu Ala |
Compound 3 |
Gly Leu Cys |
Compound 4 |
Met-Ala-Ala |
Compound 5 |
Ile,Gly,Cys |
After execution, the program generates a table similar to the input table with an additional column for each classification performed:
Experimental mass |
Name |
102.0681 |
Isovalerate |
162.1131 |
Ethyl levulinate |
195.0876 |
L-Pinitol |
274.123 |
Gly Leu Cys |
401.2131 |
6alpha-Fluoroprogesterone |
↓
Experimental mass |
Name |
Peptide |
Halogenated |
Drug |
Food |
102.0681 |
Isovalerate |
|
|
|
|
162.1131 |
Ethyl levulinate |
|
|
|
Food |
195.0876 |
L-Pinitol |
|
|
Drug |
Food |
274.123 |
Gly Leu Cys |
Peptide |
|
|
|
401.2131 |
6alpha-Fluoroprogesterone |
|
x |
|
|
REname
Many of the annotated candidate metabolites are isomers or identical compounds with different nomenclature. In untargeted metabolomics based on LC/MS or CE/MS it is not possible to identify the position of double bonds and functional groups without performing further analysis in MS/MS mode. Therefore, the table with candidate metabolites contains numerous generic annotations, with many repetitions and limited utility for the researcher.
REname facilitates the extraction and visualization of useful information by identifying isomers and equivalent compounds and merging them under a single annotation.
Experimental mass |
Name |
131.0582 |
N-acetyl-L-Alanine |
131.0582 |
N-Acetyl-beta-alanine |
217.1045 |
(2R,3R)-heptane-1,2,3-triol |
217.1045 |
heptane-1,2,3-triol |
311.1262 |
6E,8E,14E-Hexadecatriene-10,12-diynoic acid |
311.1262 |
6E,8E,14Z-Hexadecatriene-10,12-diynoic acid |
→
Experimental mass |
Name |
131.0582 |
acetyl-Alanine |
217.1045 |
heptane-triol |
311.1262 |
Hexadecatriene-diynoate |
For this purpose, REname uses a dictionary with more than 15,000 compounds associated with its simplified or generic name. In the event that a compound is not contained in the dictionary, REname will process it by applying a set of regular expressions that, sequentially, will simplify the compound name.
Input name |
Hexadecatriene-10,12-diynoic acid |
↓ |
Section 1 |
Regex |
([-\s])(\d+[,\s]{,2})+- |
Hexadecatriene-10,12-diynoic acid |
Replace |
\g<1> |
Hexadecatriene-diynoic acid |
Section 2 |
Regex |
(?i)ic acid |
Hexadecatriene-diynoic acid |
Replace |
ate |
Hexadecatriene-diynoate |
Output name |
Hexadecatriene-diynoate |
However, peptides and lipids that have fatty acids in their structure (e.g. phospholipids, sphingolipids and glycerolipids) are not processed by means of regular expressions.
In the first case, REname combines under a single annotation those peptides that have the same amino acid composition. For this purpose, amino acids must be expressed in the three-letter nomenclature.
Experimental mass |
Name |
388.1555 |
Pro Pro Met |
388.1555 |
Pro Met Pro |
388.1555 |
Met Pro Pro |
433.1408 |
Trp Asp Asp |
433.1408 |
Asp Asp Trp |
433.1408 |
Asp Trp Asp |
→
Experimental mass |
Name |
388.1555 |
Met Pro Pro |
433.1408 |
Trp Asp Asp |
In the case of fatty acid lipids, compounds are processed using the Goslin package. Goslin can process the name of lipids coming from different databases (e.g. LipidMaps, SwissLipids, HMDB) and classify them thanks to a system of parsers and predefined grammars. The information extracted from the name of the lipid is stored in an object with a fixed structure, which facilitates the access to the information of interest. REname will extract the following information from lipid compounds:
- Header group (e.g. PE or phosphoethanolamine).
- Total number of carbon atoms in fatty acids.
- Total number of double bonds in fatty acids.
- Type of bond (ether or vinyl ether) of fatty acids.
- Number of hydroxyl and methyl groups in fatty acids.
Experimental mass |
Name |
793.5565 |
PE-Cer(d16:2(4E,6E)/24:0(2OH)) |
800.5444 |
PE(18:0(10(R)Me)/16:0) |
812.5478 |
PC(15:0/18:1(9Z)) |
812.5478 |
PC(15:1(9Z)/18:0) |
869.5545 |
PI(18:3(6Z,9Z,12Z)/20:0) |
869.5545 |
PI(20:2(11Z,14Z)/18:1(9Z)) |
→
Experimental mass |
Name |
793.5565 |
PE-Cer(40:2(2OH)) |
800.5444 |
PE(34:0(Me)) |
812.5478 |
PC(33:1) |
869.5545 |
PI(38:3) |
RowMerger
RowMerger is an entity comparer that allows combining information from different annotations following user-defined criteria, i.e. grouping annotations assigned to the same feature in a single entry. This step facilitate consistently data visualization and therefore the interpretation of the results.
To execute RowMerger, the user can specify which annotation properties (i.e. table columns) will be considered during the merger and which ones will be kept in the resulting table.
By default, RowMerger will combine annotations that have the same Experimental mass, Adduct and mz Error (ppm), retaining the Identifier and the Name of merged annotations.
Experimental mass |
Identifier |
Adduct |
Name |
83.0607 |
154396 |
M+H |
Methylimidazole |
83.0607 |
61184 |
M+H |
Fomepizole |
171.0628 |
150549 |
M+Na |
Paratose |
171.0628 |
158202 |
M+Na |
Abequose |
171.0628 |
151197 |
M+Na |
Tyvelose |
↓
Experimental mass |
Identifier |
Adduct |
Name |
83.0607 |
154396 // 61184 |
M+H |
Methylimidazole // Fomepizole |
171.0628 |
150549 // 158202 // 151197 |
M+Na |
Paratose // Abequose // Tyvelose |
TPMetrics
TPMetrics applies a multi criteria scoring algorithm that incorporates analytical correlations and ionization probabilities to identify the most probable annotation. Furthermore, it also allows combining into a unique data matrix two different datasets, specifically, the one obtained after data curation by TurboPutative with another dataset that encompasses data deemed useful for the interpretation of the results.
Therefore, TPMetrics receives as input two tables specified by the user. The equivalent features of both tables will be identified thanks to their mass and their retention time (if the retention time is not present, only the mass will be considered). Besides, one of the tables must contain intensity values to calculate the score.
Putative Annotations
Additional information
Experimental mass |
Name |
83.0607 |
Methylimidazole |
104.107 |
Neurine |
83.0607 |
Valinol |
197.1009 |
Arginine |
+
Feature |
Experimental mass |
RT [min] |
I1 |
I2 |
I3 |
A08295 |
83.0607 |
10.82 |
1.12 |
1.19 |
1.15 |
A01122 |
104.10704 |
0.338 |
0.91 |
0.97 |
0.95 |
A01178 |
197.10092 |
0.339 |
0.98 |
0.91 |
0.93 |
↓
Feature |
Experimental mass |
RT [min] |
Name |
TPMetrics |
I1 |
I2 |
I3 |
A08295 |
83.0607 |
10.82 |
Methylimidazole |
25.37 |
1.12 |
1.19 |
1.15 |
A01122 |
104.107 |
0.338 |
Neurine |
12.02 |
0.91 |
0.97 |
0.95 |
A08295 |
83.0607 |
10.82 |
Valinol |
4.12 |
1.12 |
1.19 |
1.15 |
A01178 |
197.1009 |
0.339 |
Arginine |
8.47 |
0.98 |
0.91 |
0.93 |