Algorithm description

A single multi-class classifier wound only work for predicting any one localization (class) for a given mRNA sequence, on the basis of highest probability. Here, we develop 9 different binary classifiers for predicting nine different localizations for a given mRNA sequence. Each binary classifier was trained with 5 Random Forest classifiers, where the classification was done on the basis of majority voting strategy. As the sequence data cannot be utilized as such in machine learning algorithm, the sequence dataset was first transformed into numeric dataset based on k-mer compositional features. In particular, k-mer sizes 1 to 6 were used to generate 5460 compositional features and then the important features were selected using Elastic Net statistical model. The selected features were only used for prediction purpose by using Random Forest method.