Species identification plays pivotal role in preserving the species diversity. However, many problems are encountered while identification of species based on morphology, particularly for immature species. In this respect, DNA barcoding has been proven highly successful, even for closely related species. Computationally, the identification of species based on DNA barcode can be put in the prospective that for a given library consisting of barcode specimen of known species recognize an unknown specimen by matching its barcode with those present in the reference library. Though, many computational approaches are available in the literature, no specific tool is available for identification of fungal species. Moreover, since the fungal barcodes are mainly confined to ITS genomic region, existing tools are found less accurate in identifying the fungal species. Besides, some of the existing approaches have been evaluated on a small number of fungal species, which is not sufficient to judge the generalized predictive ability of those methods. Keeping this in view, the server “funbarRF” is developed for species identification with higher accuracy and fungal species in particular. Here, the barcode sequences were initially transformed onto numeric feature vectors, based on g-spaced di-peptide compositions of nucleotide bases. Then, random forest supervised learning technique was employed for prediction with the encoded dataset. To run funbarRF, the user has to provide the set of reference sequence with known species label (in BOLD format) and query sequence with hypothetical label (in BOLD format). Moreover, the user has to provide at least two query sequence to run the funbarRF.

 

Please Cite:

Meher, P. K., Sahu, T. K., Gahoi, S., Tomar, R., & Rao, A. R. (2019). funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genetics20(1), 1-13.