The dataset consisting of 438 nif (<90% pair-wise identity) and 438 non-nif (<40% pair-wise identity) proteins, which is used to train the nifPred server in the first stage of prediction can be downloaded here. This dataset has also been used to compare the performance of nifPred with that of Blastp and PSI-Blast through five-fold cross validation.
The dataset consisting of 59 nifH, 72 nifD, 86 nifK, 80 nifE, 74 nifB and 80 nifN, which is used for jacknife prediction can be downloaded here. In each category, pair-wise sequence identity is <90%. This dataset has been used to train nifPred at the second stage.
The first independent dataset consists of 83 nifH, 75 nifD, 73 nifK, 71 nifE, 75 nifB and 75 nifN which was obtained from the study of Do Santos et al. (2012) can be downloaded here. The second independent dataset consists of 2737 nifH, 1007 nifD, 983 nifK, 991 nifE, 1477 nifB and 735 nifN which was collected from InterPro database can be downloaded here.
The dataset consisting of proteome-wide data corresponding to 10 diaztrophs and 10 non-diaztrophs that is used to evaluate the performance of nifPred can be downloaded here.
The dataset that consists of proteome-wide data for four different species that is used to evaluate the performance of nifPred at threshold 0.4 can be downloaded here.