Sequence server and scientific computing clusters

Tim_Fallon · March 17, 2015, 11:12pm

Hi there,

Nice bit of software! Couple of questions related to implementation:

How does sequence server actually interface with Blast+? Parses Blast output files? Pipes from stdin?
Is it possible to have the Blast+ searches be run on a scientific computing cluster? For example: LSF (bsub blastp …)

Best,
-Tim

Anurag_Priyam · March 18, 2015, 7:09am

Thanks!

Based on user input SequenceServer constructs a command (just like you would create a command without SequenceServer e.g. blastp -query foo.fa -db “bar.fa baz.fa”) which is then executed in the shell with due security considerations. Output, in BLAST Archive format (-outfmt 11), is redirected to a file. We then obtain XML output from the archive file using blast_formatter (again, output is redirected to a file). We parse the XML and generate HTML ourselves. The same archive file is used to generate XML and tabular report for download.

We used pipes in the very early days of SequenceServer (when we were just starting out) but soon felt that pipes were unreliable. So not anymore. Query sequences are written to a file and passed to blast using -query option instead of piping from stdin. Output is written to a file which is subsequently read instead of reading from a pipe.

For antgenomes.org, which is hosted on a thin server but runs BLAST on a 48 core fat machine (designated node on QMUL’s HPC cluster), we simply replace BLAST+ binaries with a shim that executes BLAST on the fat machine via ssh:

#!/usr/bin/env sh

ssh /path/to/blastn “$@”

The same scheme can be used to queue jobs if the queuing system allows waiting on a job id. I guess the corresponding shim would look something like:

#!/usr/bin/env sh

job_id=
qsub -N $job_id /path/to/blastn “$@”

qusb -hold_jid $job_id

(or use -sync option maybe)

If waiting on job id is not allowed in the conventional UNIX sense, it will not work because SequenceServer processes requests synchronously. That bit is due to change soon though.

I hope this helps. Please let us know if you took the above suggestions to integrate SequenceServer into an HPC system. We will be happy to help along the way.

– Priyam