Dataset

All the datasets that have been used in this study can be obtained as follows:

1. The dataset used to train the ir-HSP server that caontains 2181 HSPs and 2181 non-HSPs can be downloaded here

2. The dataset consisting of six different families of HSPs used in this study viz., 354 HSP20, 1257 HSP40, 159 HSP60, 278 HSP70, 52 HSP90 and 81 HSP100 can be downloded here. This dataset has also been used for comparing the perfornmance of ir-HSP with that of existing approaches i.e., iHSP-PseRAAC, PredHSP and Ahmad et al. (2015) approach.

3. The random negative dataset that contains 5000 protein sequences randomly selected from Uniprot database, where none of the two sequences has more than 40% pair-wise sequence identity. This dataset can be obtained from the following link.

4. The dataset containing 4 different types of DnaJ proteins sequences viz., 63 Type-I, 53 Type-II, 1107 Type-III and 22 Type-IV can be downloded from here. This dataset has also been used to assess the performance of the proposed approach in predicting the four classess of DnaJ protein sequences. Besides, the performance of the proposed approach has also been compared with that of JPred and JPPRED for predicting the types of HSP40.

5. The dataset containing 12642, 22900, 18801, 14366 and 15233 sequences corresponds to Small heat shock protein family (IPR031107), Heat shock protein Dna: Cystene rich domain (IPR001305), Chaperonin Cpn60 (IPR001844), Chaperone DnaK (IPR012725) and Heat shock protein HSP90 family (IPR001404) that are collected from InterPro databse. This dataset was also used to assess the performance of ir-HSP and PredHSP. The dataset can be downloded from here.

6. The independent dataset consisting of 96 human HSPs and 55 rice HSPs, collected from the published research. This dataset was used to compare the prediction accuracy of ir-HSP with that of PredHSP tool. This dataset can be downlodaed from here.