sequenceserver startup error FATAL : FAIL : Error parsing blast databases

Hi, I’m trying to start sequenceserver for the first time (Ubuntu 14.04) and am receiving this error. I have just downloaded and unpacked the nr and nt databases, so the file it complains about ‘nr.nin’ should be up to date.

BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/home/scott/blast_databases/nr.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
T0 “/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_10349_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1414443165/c++/compilers/unix/…/…/src/objtools/blast/seqdb_reader/seqdbatlas.cpp”, line 152: Error: BLASTDB::ncbi::SeqDB_ThrowException() - Validation failed: [end <= file_size] at /home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_10349_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1414443165/c++/compilers/unix/…/…/src/objtools/blast/seqdb_reader/seqdbatlas.cpp:506

Any ideas?
(note local command line queries to the databases seem to work fine, eg ‘blastn -db /home/scott/blast_databases/nt -query /home/scott/blast_databases/TaqChromoDraft4.fa’ )

thanks! Scott

Not sure why would you get this error. Which version of BLAST+ are you using? Are you using the 0.8.7 gem release of SequenceServer or pulled from Github?

– Priyam

Yes, the 0.8.7 gem release, and blast+ 2.2.30+ binaries. Perhaps I should compile blast+ from source?

Hi Scott,

blastp seems to be working just fine:

blastp -db nr -query DNApolymeraseXfamily.fa -out DNAPolx.txt

Output (truncated):

`

52,937,773 sequences; 19,054,698,734 total letters

Query= Annotated_Chromosome_Draft_4d_0_gaps_left_CLOSED_AND_FINISHED_transl
ation_of_CDS_DNA_polymerase_X_family

Length=575
Score $
Sequences producing significant alignments: (Bits) Va$

ref>WP_003045585.1| DNA polymerase III [Thermus aquaticus] 1141 0$
ref>YP_006058593.1| DNA polymerase IV [Thermus thermophilus JL-18] 1014 0$
ref>WP_018112351.1| DNA polymerase III [Thermus igniterrae] 1012 0$
ref>WP_008632639.1| DNA polymerase III [Thermus sp. RL] 1010 0$
ref>YP_005640696.1| PHP domain-containing protein [Thermus therm… 1009 0$
ref>WP_011173218.1| DNA polymerase III [Thermus thermophilus HB27] 1001 0$
ref>YP_004758.1| DNA-dependent DNA polymerase beta chain [Thermu… 1001 0$

`

Thanks for your email, Scott.

I was wondering if for some reason SS is incompatible with the new BLAST+ 2.2.30, but that’s not the case. SS works fine here (using nr database amongst others) using BLAST+ 2.2.30.

Did you truncate the error output in your original post? If so, please could you post me the full output?

– Priyam

hi Anurag, yes I'll post the full error- it seems not to be sequenceserver itself, but rather to the blastdbcommand parameter %t. Heres the full output after starting sequence server:
scott@biolinux3:~$ sequenceserver

== Initializing SequenceServer...
/var/lib/gems/1.9.1/gems/sequenceserver-0.8.7/lib/sequenceserver/helpers.rb:118: warning: Insecure world writable dir /home/scott/software in PATH, mode 040777
I, [2014-12-06T12:59:07.048654 #3814] INFO -- : Found blastn at /home/scott/software/ncbi-blast230+/bin/blastn
I, [2014-12-06T12:59:07.048854 #3814] INFO -- : Found blastp at /home/scott/software/ncbi-blast230+/bin/blastp
I, [2014-12-06T12:59:07.048907 #3814] INFO -- : Found blastx at /home/scott/software/ncbi-blast230+/bin/blastx
I, [2014-12-06T12:59:07.048982 #3814] INFO -- : Found tblastn at /home/scott/software/ncbi-blast230+/bin/tblastn
I, [2014-12-06T12:59:07.049047 #3814] INFO -- : Found tblastx at /home/scott/software/ncbi-blast230+/bin/tblastx
I, [2014-12-06T12:59:07.049110 #3814] INFO -- : Found blastdbcmd at /home/scott/software/ncbi-blast230+/bin/blastdbcmd
I, [2014-12-06T12:59:07.049177 #3814] INFO -- : Found makeblastdb at /home/scott/software/ncbi-blast230+/bin/makeblastdb
I, [2014-12-06T12:59:07.049232 #3814] INFO -- : Found blast_formatter at /home/scott/software/ncbi-blast230+/bin/blast_formatter
F, [2014-12-06T12:59:07.458557 #3814] FATAL -- : Fail: Error parsing blast databases.
Tried: '/home/scott/software/ncbi-blast230+/bin/blastdbcmd -recursive -list /home/scott/blast_databases -list_outfmt "%p %f %t" 2>&1'
It crashed with the following error: 'Protein /home/scott/blast_databases/nr All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects
BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/home/scott/blast_databases/nr.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
    T0 "/home/scott/software/ncbi-blast-2.2.30+-src/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp", line 152: Error: BLASTDB::ncbi::SeqDB_ThrowException() - Validation failed: [end <= file_size] at /home/scott/software/ncbi-blast-2.2.30+-src/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp:506

'
Try reformatting databases using makeblastdb.

I get the same error if I run the blastdbcommand with those three parameters, but using only the first two "%p %f" works fine. Also, I did recompile from source on this system, and the new blastp and other commands work, but blastdbcommand still pukes on the three parameter list.

Hope that helps! I'm stymied!

Ok. I don’t think this is a bug in SequenceServer that needs fixing.

%t is a valid list_outfmt format specifier in BLAST+ 2.2.30. If there’s indeed an error with database, as BLAST complains - “Database error: …”, I think %p and %f shouldn’t work either. Can you check blastdbcmd’s reponse to other list_outfmt options?

I have three suggestions to get SequenceServer working:

  1. If BLAST+ 2.2.29 or lower works, use that for now.
  2. See if pre-compiled BLAST+ binaries work. I suggest this because after compiling 2.2.30 from source on my system, blastdbcmd -help segfaults while everything else works normally. On the other hand pre-compiled executables from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ just works.
  3. Remove ’ %t’. In the file /var/lib/gems/1.9.1/gems/sequenceserver-0.8.7/lib/sequenceserver/helpers.rb, on line 62. SequenceServer should still work, except it will display path to the blast databases in the search interface than pretty titles.

Of course, option 3 is the least preferred.

– Priyam

hi Anurag, I think now it must be something in the database itself- precompiled 2.2.29 and 2.2.30+ both give the same error, and blastdbcmd of both versions runs normally with any of the other options (%p, %f). Only when including %t do I get the error. I will download fresh copies of all the volumes and try again, but thanks for your help!
cheers- Scott

Thanks for debugging this together, Scott.

Regarding downloading nt and nr dbs, I have often run into problems with the downloaded db if I downloaded them through an http proxy. If you are using a proxy, see if you can somehow bypass it to download the dbs from NCBI.

– Priyam

Hey Scott,

I was just able to reproduce a similar error -

$ blastdbcmd -recursive -list /public/permtestdb -list_outfmt “%p %f %t”

"BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/public/permtestdb/SI2.2.3.fa.pin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
T0 “/private/tmp/blast-ZEYm3t/ncbi-blast-2.2.30±src/c++/src/corelib/ncbifile.cpp”, line 5375: Error: ncbi::CMemoryFileMap::x_Open() - CMemoryFile: Cannot memory map file “/public/permtestdb/SI2.2.3.fa.pin”

For me too it works without %t.

I traced it to incorrect permission on the database files (.nsd, .nhr, .nin, etc.). Perhaps that could be your case too?

– Priyam

Hi Anurag, that is the most likely explanation. I went ahead and downloaded fresh copies of nt and nr, and unpacked them right on the target machine, and I now have success with loading SequenceServer and performing blast searches! (I’m using my compiled copy since I need the multithreading support for the blast searches- 64 cores ;))
cheers- Scott