Overview

TurboPutative is a bioinformatic tool that streamlines the putative annotation process in Metabolomics, allowing data matrix handling and simplification, facilitating data visualization, useful information mining and, eventually, the prioritization of which candidate metabolites to investigate further. TurboPutative should receive as input a table with a format as shown in the example file. Additionally, in case of using TableMerger, a second table must be uploaded.

TurboPutative is designed to handle data obtained by untargeted metabolomics based on liquid chromatography coupled to mass spectrometry (with high-resolution analyzers) using electrospray ionization (LC/ESI-MS). In this type of studies, the list of candidate metabolites are generally obtained using tools (e.g. Ceu Mass Mediator 3.0 or MetaboSearch) that compare the experimental monoisotopic mass of a features with the theoretical mass of compounds listed in one or several databases (e.g. HMDB, Metlin, Kegg, LipidMaps, among others).

Features detected

Experimental mass
336.8786
98.0577
189.0468

→

Candidate metabolites

Experimental mass	Adduct	mz Error (ppm)	Name
336.8786	M+H-H2O	7	Bithionol
336.8786	M+H-H2O	7	Tetradifon
98.0577	M+Na	1	Trimethylamine-N-oxide
98.0577	M+Na	1	1-Amino-propan-2-ol
98.0577	M+Na	1	3-aminopropan-1-ol
189.0468	M+H	1	2-Chlorobiphenyl
189.0468	M+H	1	4-Chlorobiphenyl
189.0468	M+Na	7	Ethionamide

Despite the use of high-resolution analyzers (OrbiTrap or TOF), a large number of candidate metabolites can be assigned to the same feature. In fact, several metabolites present the same molecular formula, but different structure, resulting in completely different compounds, with different chemical and physical properties. On top of that, different metabolites results in very similar m/z values after ionization with different adducts. The result is a long and complicated matrix with several candidates per features.

For this reasons we have developed TurboPutative, to help researchers determine which metabolites are most likely, avoiding the arduous, laborious and time-consuming process of manually reviewing each entry (comparing their associated errors and adducts, frequently searching databases to identify possible dietary, drug, or microorganism compound…etc.).

TurboPutative allow “Metabolomists” to proceed quickly with the biological interpretation of the results, or to prioritize both, the purchase of authentic chemical standards, and the analyses in MS/MS mode for accurate metabolites identification (view sample results).

Candidate metabolites (Input)

Experimental mass	Adduct	mz Error (ppm)	Name
336.8786	M+H-H2O	7	Bithionol
336.8786	M+H-H2O	7	Tetradifon
98.0577	M+Na	1	Trimethylamine-N-oxide
98.0577	M+Na	1	1-Amino-propan-2-ol
98.0577	M+Na	1	3-aminopropan-1-ol
189.0468	M+H	1	2-Chlorobiphenyl
189.0468	M+H	1	4-Chlorobiphenyl
189.0468	M+Na	7	Ethionamide

↓

Tagger

Experimental mass	Adduct	mz Error (ppm)	Name	Halogenated	Microbial	Drug
336.8786	M+H-H2O	7	Bithionol	x
336.8786	M+H-H2O	7	Tetradifon	x
98.0577	M+Na	1	Trimethylamine-N-oxide		MC
98.0577	M+Na	1	1-Amino-propan-2-ol
98.0577	M+Na	1	3-aminopropan-1-ol
189.0468	M+H	1	2-Chlorobiphenyl	x
189.0468	M+H	1	4-Chlorobiphenyl	x
189.0468	M+Na	7	Ethionamide			Drug

↓

REname

Experimental mass	Adduct	mz Error (ppm)	Name	Halogenated	Microbial	Drug
336.8786	M+H-H2O	7	Bithionol	x
336.8786	M+H-H2O	7	Tetradifon	x
98.0577	M+Na	1	Trimethylamine-oxide		MC
98.0577	M+Na	1	Amino-propan-ol
189.0468	M+H	1	Chlorobiphenyl	x
189.0468	M+Na	7	Ethionamide			Drug

↓

RowMerger

Experimental mass	Adduct	mz Error (ppm)	Name	Halogenated	Microbial	Drug
336.8786	M+H-H2O	7	Bithionol // Tetradifon	x
98.0577	M+Na	1	Trimethylamine-oxide		MC
98.0577	M+Na	1	Aminopropan-ol
189.0468	M+H	1	Chlorobiphenyl	x
189.0468	M+Na	7	Ethionamide			Drug

Tagger

Tagger is a classifier capable of detecting metabolites characterized as nutrients, drugs, microbial, natural products, plants, halogenated or peptides. The classification is performed using regular expressions and predefined lists extracted from different databases. The following is a description of how each of the classification is made:

Nutrients: Nutrient classification is performed using a predefined list of compounds extracted from the HMDB. Specifically, the list contains compounds that are characterized in HMDB as "Food" or as "Food and Nutrition", excluding those of endogenous origin ("Endogenous"). The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
Drugs: Drug classification is performed using a predefined list of compounds extracted from the HMDB and DrugBank. Specifically, the list contains all DrugBank compounds and those metabolites that are characterized in HMDB as "Drug" or as "Pharmaceutical industry". In both cases, it was excluded the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
Plants: Plant metabolites classification is performed using a predefined list of compounds extracted from the PlantCyc database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
Natural Products: Natural products classification is performed using a predefined list of compounds extracted from the LOTUS database, excluding the molecules characterized as “Endogenous” in the HMDB. The list obtained was enriched with the synonyms of the compounds extracted from PubChem database.
Microbiote-dependent: Microbiote-dependent metabolites classification is performed using an in-house list of compounds and metabolites contained in Metabolomics Data Explorer (Sonnenburg Laboratory), enriched with their synonyms contained in PubChem database.
Halogens: Halogenated compounds are classified using regular expressions. If the input table contains a column with the “Molecular Formula” of the compounds, it is applied a regular expression that matches halogenated elements. Otherwise, it is applied a regular expression that can identify the presence of halogens in the name of the compound.
Peptides: Peptides are classified using regular expressions that can identify compounds that are made up solely of amino acids.

Halogens

Regular Expression	([Ff]luor(?!ene)\|[Cc]hlor(?!ophyl)\|[Bb]rom\|[Ii]od)
Compound 1	14,14,14-Trifluoro-11E-tetradecenyl acetate
Compound 2	6-(2-Chloroallylthio)purine
Compound 3	Bromhexine
Compound 4	Fluorene
Compound 5	Chlorophyl

Peptides

Regular Expression	(?i)^(Ala\|Arg\|Asn\|Asp\|Cys\|Gln\|Glu\|Gly\|His\|Ile\|Leu\| Lys\|Met\|Phe\|Pro\|Ser\|Thr\|Trp\|Tyr\|Val\|[-,\s]){3,}$
Compound 1	Pro Gly
Compound 2	Glu Ala
Compound 3	Gly Leu Cys
Compound 4	Met-Ala-Ala
Compound 5	Ile,Gly,Cys

After execution, the program generates a table similar to the input table with an additional column for each classification performed:

Experimental mass	Name
102.0681	Isovalerate
162.1131	Ethyl levulinate
195.0876	L-Pinitol
274.123	Gly Leu Cys
401.2131	6alpha-Fluoroprogesterone

↓

Experimental mass	Name	Peptide	Halogenated	Drug	Food
102.0681	Isovalerate
162.1131	Ethyl levulinate				Food
195.0876	L-Pinitol			Drug	Food
274.123	Gly Leu Cys	Peptide
401.2131	6alpha-Fluoroprogesterone		x

REname

Many of the annotated candidate metabolites are isomers or identical compounds with different nomenclature. In untargeted metabolomics based on LC/MS or CE/MS it is not possible to identify the position of double bonds and functional groups without performing further analysis in MS/MS mode. Therefore, the table with candidate metabolites contains numerous generic annotations, with many repetitions and limited utility for the researcher.

REname facilitates the extraction and visualization of useful information by identifying isomers and equivalent compounds and merging them under a single annotation.

Experimental mass	Name
131.0582	N-acetyl-L-Alanine
131.0582	N-Acetyl-beta-alanine
217.1045	(2R,3R)-heptane-1,2,3-triol
217.1045	heptane-1,2,3-triol
311.1262	6E,8E,14E-Hexadecatriene-10,12-diynoic acid
311.1262	6E,8E,14Z-Hexadecatriene-10,12-diynoic acid

→

Experimental mass	Name
131.0582	acetyl-Alanine
217.1045	heptane-triol
311.1262	Hexadecatriene-diynoate

For this purpose, REname uses a dictionary with more than 15,000 compounds associated with its simplified or generic name. In the event that a compound is not contained in the dictionary, REname will process it by applying a set of regular expressions that, sequentially, will simplify the compound name.

Input name			Hexadecatriene-10,12-diynoic acid	↓
Section 1	Regex	([-\s])(\d+[,\s]{,2})+-	Hexadecatriene-10,12-diynoic acid
Section 1	Replace	\g<1>	Hexadecatriene-diynoic acid
Section 2	Regex	(?i)ic acid	Hexadecatriene-diynoic acid
Section 2	Replace	ate	Hexadecatriene-diynoate
Output name			Hexadecatriene-diynoate

However, peptides and lipids that have fatty acids in their structure (e.g. phospholipids, sphingolipids and glycerolipids) are not processed by means of regular expressions.

In the first case, REname combines under a single annotation those peptides that have the same amino acid composition. For this purpose, amino acids must be expressed in the three-letter nomenclature.

Experimental mass	Name
388.1555	Pro Pro Met
388.1555	Pro Met Pro
388.1555	Met Pro Pro
433.1408	Trp Asp Asp
433.1408	Asp Asp Trp
433.1408	Asp Trp Asp

→

Experimental mass	Name
388.1555	Met Pro Pro
433.1408	Trp Asp Asp

In the case of fatty acid lipids, compounds are processed using the Goslin package. Goslin can process the name of lipids coming from different databases (e.g. LipidMaps, SwissLipids, HMDB) and classify them thanks to a system of parsers and predefined grammars. The information extracted from the name of the lipid is stored in an object with a fixed structure, which facilitates the access to the information of interest. REname will extract the following information from lipid compounds:

Header group (e.g. PE or phosphoethanolamine).
Total number of carbon atoms in fatty acids.
Total number of double bonds in fatty acids.
Type of bond (ether or vinyl ether) of fatty acids.
Number of hydroxyl and methyl groups in fatty acids.

Experimental mass	Name
793.5565	PE-Cer(d16:2(4E,6E)/24:0(2OH))
800.5444	PE(18:0(10(R)Me)/16:0)
812.5478	PC(15:0/18:1(9Z))
812.5478	PC(15:1(9Z)/18:0)
869.5545	PI(18:3(6Z,9Z,12Z)/20:0)
869.5545	PI(20:2(11Z,14Z)/18:1(9Z))

→

Experimental mass	Name
793.5565	PE-Cer(40:2(2OH))
800.5444	PE(34:0(Me))
812.5478	PC(33:1)
869.5545	PI(38:3)

RowMerger

RowMerger is an entity comparer that allows combining information from different annotations following user-defined criteria, i.e. grouping annotations assigned to the same feature in a single entry. This step facilitate consistently data visualization and therefore the interpretation of the results.

To execute RowMerger, the user can specify which annotation properties (i.e. table columns) will be considered during the merger and which ones will be kept in the resulting table.

By default, RowMerger will combine annotations that have the same Experimental mass, Adduct and mz Error (ppm), retaining the Identifier and the Name of merged annotations.

Experimental mass	Identifier	Adduct	Name
83.0607	154396	M+H	Methylimidazole
83.0607	61184	M+H	Fomepizole
171.0628	150549	M+Na	Paratose
171.0628	158202	M+Na	Abequose
171.0628	151197	M+Na	Tyvelose

↓

Experimental mass	Identifier	Adduct	Name
83.0607	154396 // 61184	M+H	Methylimidazole // Fomepizole
171.0628	150549 // 158202 // 151197	M+Na	Paratose // Abequose // Tyvelose

TPMetrics

TPMetrics applies a multi criteria scoring algorithm that incorporates analytical correlations and ionization probabilities to identify the most probable annotation. Furthermore, it also allows combining into a unique data matrix two different datasets, specifically, the one obtained after data curation by TurboPutative with another dataset that encompasses data deemed useful for the interpretation of the results.

Therefore, TPMetrics receives as input two tables specified by the user. The equivalent features of both tables will be identified thanks to their mass and their retention time (if the retention time is not present, only the mass will be considered). Besides, one of the tables must contain intensity values to calculate the score.

Putative Annotations

Additional information

Experimental mass	Name
83.0607	Methylimidazole
104.107	Neurine
83.0607	Valinol
197.1009	Arginine

Feature	Experimental mass	RT [min]	I1	I2	I3
A08295	83.0607	10.82	1.12	1.19	1.15
A01122	104.10704	0.338	0.91	0.97	0.95
A01178	197.10092	0.339	0.98	0.91	0.93

↓

Feature	Experimental mass	RT [min]	Name	TPMetrics	I1	I2	I3
A08295	83.0607	10.82	Methylimidazole	25.37	1.12	1.19	1.15
A01122	104.107	0.338	Neurine	12.02	0.91	0.97	0.95
A08295	83.0607	10.82	Valinol	4.12	1.12	1.19	1.15
A01178	197.1009	0.339	Arginine	8.47	0.98	0.91	0.93