BLAST database aliases & taxonomy

Hello,

I have created an alias for a subset of the nr database using a taxid list (my BLAST version is 2.11.0 and the sequenceserver version is 2.0.0.rc8 and otherwise everything works well).

When I run sequenceserver, this is what I get:

One or more databases in /mnt/dbssd/db/blast are likely incompatible.

Incompatible databases can cause BLAST searches and other features of
SequenceServer to fail unexpectedly.
You can view incompatible databases and choose to reformat them below.
Alternatively, please remove them from databases directory.

View incompatible databases? [y/n] (Default: y).

y

FASTA file to reformat: /mnt/dbssd/db/blast/subsetdb
FASTA type: protein
Proceed? [y/n] (Default: y): y
Enter a database title or will use ‘subsetdb_name’: xxx

“xxx” refers to anything I write there - it does not work. When I press Enter, nothing happens and I just need to kill sequenceserver. The only file that blastdb_aliastool created was a .pal file with recalculated database length and the path to my taxid list.

I tried an alternative approach and define the taxa inside sequenceserver. Even after downloading the taxdb I cannot limit the BLAST search in sequenceserver to specific taxa: options “-taxidlist” or “-taxid” seem to be disallowed.

Is there something I am missing?

Thank you,
Lukasz

Hi Lukasz,

I think what is happening is that SequenceServer is taking a long time to extract taxonomy info from the old database. I am guessing your subset of the nr database is still quite big that it is taking a long time. It needs to be smarter, but you can for now work around it by putting a ‘.taxid_map.txt’ file next to the database. Since you made the subset you must know the taxonomy id for each sequence.

I am guessing you see a ‘Invalid characters detected in options’ message when you try to use -taxids option. We can’t support -taxidlist because the argument is a file (I will need to make a note of that in the documentation). But -taxids is allowed. I think there is a bug though - looks like seqserv will throw an error if you try to provide more than one taxonomy ids separated by a comma (-taxids 1,2). I think you can temporarily work around it by adding comma to the regular expression pattern defined in line 158 of lib/sequenceserver/blast/job.rb.

Thanks for bringing both the issues to our attention and I apologise for the bugs.

Priyam

Dear Anurag,

Thank you for your answer. I was not sure I understood what you meant by putting a file next to the database. Then I tried creating a subset with one taxid (9606) and what sequenceserver did was to create the .taxid_map.txt file and extracted all of the sequences from the database. In the end it created 896,8 MB of files. By creating the subsets I actually wanted to save some space - I thought that only the .pal file was necessary for BLAST to work?

However, the -taxids option does work indeed. Thanks!

Cheers,
Lukasz