logoNII

      RiPPMiner-Genome: A web resource for automated prediction of cross-linked chemical structures of RiPPs
This website is free and open to all users and there is no login requirement



Keyword Search:  

DATASET

The current version of RiPPMiner-Genome has been developed using a dataset of 259 experimentally characterized RiPP BGCs (106 lanthipeptides, 55 lassopeptides, 31 thiopeptides, 26 cyanobactins, 12 glycocins, 10 LAPs, 10 head-to-tail peptides, 6 linaridins and 3 bottromycins) for which genomic sequences as well as chemical structures of RiPP biosynthetic products are known. These 259 RiPPs with BGC information contain examples where multiple RiPPs are biosynthesized by a single BGC from different RiPP precursor peptides or different types of PTMs of a single precursor peptide, thus number of unique RiPP BGCs are 204. The sequence and chemical structure information from this dataset has been used for training and validation of the machine learning (ML) classifier and benchmarking the overall performance of cross-linked chemical structure prediction starting from genomic sequences. The dataset is available in spreadsheet format as Known_RiPPs_BGC.xlsx.


BENCHMARKING RESULTS FOR RiPPMINER-GENOME

Correct identification of RiPP BGC,modifying enzymes and prediction of RiPP class could be done in 98% of cases for 9 different RiPP families. In more than 89% of cases the correct precursor peptide could be identified by the machine learning approach for these 9 RiPP families. For four major RiPP families, RiPPMIner-Genome could rank the correct cleavage and cross-link pattern among top three predictions

Table2























Summary of benchmarking results for predictions of RiPP precursors

Table2


























BENCHMARKING OF THE MACHINE LEARNING MODEL FOR PREDICTION OF THIOPEPTIDE CLEAVAGE


Table2

  Benchmarking was carried out on the Thiopeptide dataset provided by Schwalen J. C. et al (PMID: 29983054) and the    predictions were compared with those from RODEO.

DatasetROC-AUC
613 Thiopeptides(RODEO)0.99













BENCHMARKING RESULTS FOR THIOPEPTIDE CROSS-LINK PREDICTION


logo

  Benchmarking was carried out on the exprerimentally characterized thiopeptides BGCs.

                                                                                        Method: Random Forest (RF)

TotalPositive SetNegative Set2-FOLD AUC10-FOLD AUC
198 291690.890.92













BENCHMARKING RESULTS FOR LANTHIPEPTIDE MODIFIED RESIDUES PREDICTION

logo
Method: Support Vector Machine (SVM)
False Positive Rate (FPR %)True Positive Rate (TPR %)
1078














logo

The Machine Learning (ML) Model was trained on 49 Lanthipeptides and was Tested on the remaining dataset of 49 Lanthipeptides in the Blind Test. The result shows here are the percent Accuracy for each of these 49 Lanthipeptides. The % Accuracy was calculated by counting the number of correctly predicted Modification States of Ser/Thr/Cys Residues by ML Model divided by total number of such residues present in the Core region of each Lanthipeptide. The average % Accuracy was 76%. The average % Accuracy was 76%.