Algorithm description
Two important aspects for development of any machine learning based algorithm are the generation of numeric features using the sequence dataset and application of machine learning method on the generated numeric dataset. Here, prediction was performed with one versus others basis. In other way, for a specified class of locations, the sequences of other localizations together constituted the negative class of the binary classifier. Since there are eight localizations considered, eight different binary classifiers were constructed corresponding to each different locations. In other words, prediction was made for the locations axon, cytoplasm, circulating, exosome, extracellular vesicle, microvesicle, mitochondrion and nucleus.
Features: Pricipal component scores of two different feature sets are utilized that are pseudo di-nucleotide compositions (PseDNC) along with thermodynamic and structural features of di-nucleotides (DiPro). The PseDNC features were obtained from Pse-in-One server, whereas DiPro features were constructed from DiProDB database.
SVM: The support vector machine was implemented as prediction algorithm with radial basis function as kernel. For different localisation predictor different values of the gamma and regularization parameter was adopted based on an optimization strategy.