The user has to supply two input files, one is reference dataset and other is query dataset. In both reference and query sets, the barcode sequences should be in the FASTA format, where the sequence identifier should be in BOLD (http://www.boldsystems.org/) format. However in case of query set, if the name of barcode sequence is not known, a hypothetical name must be supplied and it should be also in the BOLD format. An example of reference set, query set (with known species name) and query set (with hypothetical name) is provided below.
INPUT
REFERENCE SET
>EF079971|Ametrida centurio|ROM 98798|COI
ACATTGTACTTACTATTTGGTGCTTGAGCAGGAATAGTAGGTACCGCACTAAGCCTACTTATTCG
TGCAGAACTTGGACAACCTGGGGCTCTATTAGGTGACGACCAAATCTATAATGTTATCGTTACAG
CCCACGCTTTCGTAATGATTTTCTTTATAGTAATACCCATCATGATTGGAGGGTTCGGCAACTGA
CTTGTACCACTAATAATTGGCGCACCTGACATAGCATTCCCACGAATAAATAACATAAGCTTCTG
ACTTCTCCCACCCTCTTTCCTGCTTCTACTGGCCTCCTCAACAGTCGAAGCTGGTGTTGGGACTG
CTTATTT------
>EF079972|Ametrida centurio|ROM 98849|COI
ACATTGTACTTACTATTTGGTGCTTGAGCAGGAATAGTAGGTACCGCACTAAGCCTACTTATTCG
TGCAGAACTTGGACAACCTGGGGCTCTATTAGGTGATGACCAAATCTATAATGTTATCGTTACGG
CCCACGCTTTCGTAATGATTTTCTTTATAGTAATGCCCATCATGATTGGAGGGTTCGGCAACTGA
CTTGTACCACTAATAATCGGCGCACCTGACATAGCATTCCCACGAATAAATAACATAAGCTTCTG
ACTTCTCCCACCCTCTTTCCTACTTCTACTGGCCTCCTCAACAGTTGAAGCTGGTGTTGGGACTG
TAGTC---------->EF079973|Ametrida centurio|ROM 100832|COI
ACATTGTACTTACTATTTGGTGCTTGAGCAGGAATAGTAGGTACCGCACTAAGCCTACTTATTCG
TGCAGAACTTGGACAACCTGGGGCTCTATTAGGTGATGACCAAATCTATAATGTTATCGTTACAG
CCCACGCTTTCGTAATGATTTTCTTTATAGTAATGCCCATCATGATTGGAGGGTTCGGCAACTGA
TGTCA--
QUERY SET (With known species name)
>EF079975|Ametrida centurio|ROM 101098|COI
ACATTGTACTTACTATTTGGCGCTTGAGCAGGGATAGTAGGTACCGCACTAAGCCTACTTATTCG
TGCAGAACTTGGACAACCTGGGGCTCTATTAGGTGATGACCAAATCTATAATGTTATCGTTACAG
CCCACGCTTTCGTAATGATTTTCTTTATAGTAATGCCCATCATGATTGGAGGGTTCGGCAACTGA
CTTGTACCACTAATAATCGGCGCACCTGACATAGCATTCCCACGAATAAATAACATAAGCTTCTG
ACTTCTCC----
>EF079991|Anoura caudifer|ROM 115346|COI
ACTCTGTACTTACTATTCGGCGCCTGAGCTGGCATAGTAGGTACCGCACTAAGCCTTCTCATCCG
TGCTGAGCTAGGCCAACCCGGAGCCCTGTTAGGTGATGATCAAATTTACAATGTAATCGTAACAG
CCCATGCCTTTGTAATAATTTTCTTCATAGTTATGCCAATTATAATCGGAGGTTTTGGCAATTGA
CTAATCCCCCTAATAATTGGAGCACCTGATATAGCATTTCCTCGGATGAATAATATAAGCTTCTG
ACTTC---
QUERY SET (With hypothetical species name)
>A1|S1 P1|B1 C1|D1
ACTCTATACTTACTGTTTGGTGCCTGAGCCGGTATAGTAGGCACTGCACTTAGCCTTCTCATCCG
CGCCGAATTGGGCCAACCTGGAGCTTTATTAGGTGATGACCAAATCTATAATGTAATCGTAACAG
CTCATGCATTCGTGATAATTTTCTTCATAGTGATACCAATCATAATTGGAGGCTTTGGTAACTGA
CT-----
>A2|S2 P2|B2 C2|D2
ACTCTATACTTACTGTTTGGTGCCTGAGCCGGTATAGTAGGCACTGCACTTAGCCTTCTCATCCG
CGCCGAATTGGGCCAACCTGGAGCTTTATTAGGTGATGACCAAATCTATAATGTAATCGTAACAG
CTCATGCATTCGTGATAATTTTCTTCATAGTGATACCAATCATAATTGGAGGCTTTGGTAACTGA
C---
OUT PUT
The processed results are displayed in two separate text area as TRAINING RESULT and TEST RSULT, one for training dataset and other for test dataset. In TRAINING RESULT file, number of observed individual belongs to a certain species and the number of correctly identified individual belongs to that species are provided. In TEST RESULT file, both hypothetical species label (supplied by user) and predicted species label of each query barcodes are provided. Also, the links for downloading the result files are provided as Download Training Result and Download Test Result.