taxonomy data with SequenceServer

Hey all,

BLAST can output scientific names, common names, BLAST names, and kingdoms for each hit in tabular output. For this to work, databases should be created with -taxid option and NCBI taxdb must be locatable on local machine by BLAST. This can be helpful when BLASTing against several species or NCBI NR database for example.

SequenceServer 1.0.4 make it possible to get this taxonomy data in the full tabular report download option. If you are using NCBI NR database, all you have to do is to run:

$ sequenceserver --download-taxdb

If you are using your own database, you will have to tell SequenceServer taxid of the sequences contained in the FASTA file. First remove existing BLAST databases. Then run,

$ sequenceserver -m

It will ask you for taxid. For example,

FASTA file: /Users/priyam/biodb/protein/Solenopsis_invicta/SI2.2.3.fa
FASTA type: protein
Proceed? [y/n] (Default: y):
Enter a database title or will use 'SI 2.2.3 ':

Enter taxid (optional):

Enter taxid here. You can get the taxid by searching for the species name at http://www.ncbi.nlm.nih.gov/taxonomy. (It’s 13686 for Solenopsis invicta in the above example).

Once this is done, just download taxdb, like above,

$ sequenceserver --download-taxdb

And BLAST! The taxomony data will be available to you in the full tabular report download.

– Priyam

Hi Priyam,

This seems like it could be really useful. Could you expand a little more on how to enable this?

For example, I have SequenceServer running in Apache via Passenger and so it loads configuration from a custom config.yaml file each time it starts up.

I have previously used “makeblastdb” with the -taxid option invoked with the right ncbi taxid for each blast database file in the database_dir.

I tried running $ sequenceserver --download-taxdb and then pointing it to my directory and it downloads two files to my .sequenceserver directory under home. They look like they contain everything in NCBI (ie I can see Bacteria and Viruses etc) whereas my database_dir only has Stramenopiles in it. Leaving the taxdb.* files in .sequenceserver or moving them to the database_dir doesn’t seem to change anything in the output results from a blast search…

What more do I need to do? I’m a bit lost…

Best,

Guy

Hey Guy,

The taxonomy information is currently available only in the “Full tabular report” download (the second last link in the sidebar on the right), and not in the HTML report. Can you confirm if this is not the case for you?

The files downloaded by ‘sequenceserver --download-taxdb’ must be kept in ~/.sequenceserver for this to work. The files downloaded are NCBI’s taxdb, as such they are bound to contain entry for all data on NCBI. It’s just a few megabytes big - shouldn’t be a problem for anybody.

When you run BLAST and request tabular output (-outfmt 6), and if taxdb is in the directory from where BLAST is run, based on the taxid of the hits BLAST will look up scientific name, common name, etc. from the taxdb and put it in the tabular output. Two aspects of this are facilitated by SequenceServer: downloading taxdb (–download-taxdb) and ensuring BLAST will find it (by cd-ing into ~/.sequenceserver directory when tabular output is requested).

– Priyam

Hi Priyam,

Yeah I have looked in the full tabular output report and in the taxonomy columns I just get “N/A”…

I can confirm that the “taxdb.btd” and “taxdb.bti” are in ~/.sequenceserver and that makeblastdb was definitely invoked with --taxid for each database.

– Guy

Hey Guy,

Maybe Apache processes are run as some other user, say ‘www-data’ and thus ~ expands to /home/www-data/.sequenceserver, whereas the taxdb.btd and taxdb.bti are in /home/guyleonard/.sequenceserver?

If that’s indeed the case, adding the following to config.ru of the Apache deploy (and restarting the app/Apache) is likely to help:

SequenceServer::DOTDIR = /home/guyleonard/.sequenceserver

The line should be added after “require ‘sequenceserver’” and before ‘SequenceServer.init’.

– Priyam

Aha yes! Of course, good call! Yep that works nicely.

Cheers!

Just so I don’t forget, I had to put the directory in between quotes “” :slight_smile: