# Datasets

**Training set:** The training dataset comprises 1504, 1798, 850, 703, 381, 2754, 180, 156 and 1532 sequences for the corresponding localization cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome. For a given localization, mRNA sequences of that localization comprises the positive set and the sequences of the remaining 8 localizations constitute the negative set. The training dataset can be downloaded **here**.

**Independent test set-I: **The independent test set-I comprises 300, 360, 170, 140, 76, 550, 36, 31, 306 sequences for the corresponding localization. This dataset can be downloaded **here**.

**Independent test set-II:** The independent test set-I comprises 490, 1037, 485, 185, 14, 1266, 79, 121 and 798 sequences for the corresponding localization. This dataset can be downloaded **here**.

**Test set-I:**This dataset contains 86, 31, 25 and 83 sequences for the localization cytoplasm, endoplasmic reticulum, mitochondrion and nucleus respectively. This dataset can be downloaded** here**.

**Test set-II:**This dataset contains 464, 103, 8 and 508 sequences sequences for the localization cytoplasm, endoplasmic reticulum, mitochondrion and nucleus respectively. This dataset can be downloaded **here**.