Feature generation In iAMPpred, three different categories of features has been used i.e., compositional features (pseudo amino acid compositions and normalized amino acid compositions), structural features (α-helix, β-sheet and turn structure propensity) and physico-chemical properties (isoelectric point, hydrophobicity and net-charge). The compositional and physico-chemical features were computed by using “Peptide” package of R-software, whereas the structural features were computed using TANGO software, available at http://tango.crg.es/.
Prediction using support vector machine Support vector machine (SVM) is used in iAMPpred for prediction of AMPs and non-AMPs as it is a non-parametric and most widely used supervised learning technique in bioinformatics, attributed to its sound statistical background. The predictive ability of SVM mainly depends upon the type of kernel function that maps the input data to a high-dimensional feature space, where the observations belong to different classes are linearly separable by the optimal separating hyper plane. Here, the radial basis function (RBF) was used kernel, due to its wide and successful application in most of the AMP prediction studies. Further, in RBF kernel, default values of parameters gamma (gamma=1/number of attributes) and cost (C=1) were used to trained and test the prediction model. The svm function available in e1071 package of R-software was used for executing SVM model. The scaling option was kept as TRUE, while training the model using the svm function.
Training datasets The iAMPpred has been trained with 984 antibacterial (ABP) and 984 non-antibacterial (non-ABP) peptides, 739 antiviral (AVP) and 739 non-antiviral (non-AVP) peptides, 1384 antifungal (AFP) and 1384 non-antifungal (non-AFP) peptides. The antibacterial, antiviral and antifungal peptides were collected from the following public resources.
The antibacterial peptides were collected from the following links.
The antiviral peptides were collected from the following links
The anti fungal peptides were collected from the following links
Further, non-antibacterial and non-antiviral peptides were collected from AntiBP2 and AVPpred respectively. Except AntiBP2, sequence identifiers are present in other databases. Therfore, the identifiers for AntiBP2 sequences are created based on the serial number of the sequences. For example, the 10th sequence of negative dataset for ANtiBP2 is annotated as Non_ABP_010. As the negative sequences of AVPpred were taken from negative dataset of AntiBP2, one can get the sequence sequence identifiers in non-AVP dataset starting with Non_ABP.
For the postive dataset,
For the negative dataset,
These datasets can be downloaded from the links given below. |