API

Richard_Adams · June 16, 2020, 7:50pm

Hi
I’m a developer investigating making Blast searches available programmatically, and came across your project, which is great, with the Docker image it was super-easy to get started in minutes.

I’ve got few disparate comments and queries

Compared to version 1, version 2 runs much slower on my ubuntu 18 laptop running SequenceServer on Docker 19.0.3. I installed a small database (S pombe cDNA library, about 9Mb) and in version 1 searches were instantaneous. In version 2.0 beta, the same query takes 20- 30 seconds to complete (this happens in both the web page and using curl. If there are any log files or similar you could point me to I could submit would be happy to send them.) The initial response with the job URL in the location header is quick, but subsequent call to get the job is slow.
If I query curl http://localhost:4567/jobid.json, the response is in JSON but the response header says the content is text/html; this could confuse some clients:
Content-Type: text/html;charset=utf-8
I’m discovering the API by trial and error by inspecting the traffic from the web page, I would be happy to submit a PR with a document with some snippets for invoking through a client like curl if that would be helpful if it doesn’t exist already.
Is it possible to get a list of running jobs? For example to be able to throttle requests to the server depending on number of running jobs. Also i read somewhere that you have to restart the application after updating the Blast databases, it would be good to wait till there are 0 jobs before doing that but on a server used by many people I’m not sure how you would know that.
Thanks for your time and apologies if these are known or non-issues
Richard

Anurag_Priyam · June 19, 2020, 1:38pm

Hi Richard,

Thanks for your message.

Slower speeds using docker have been reported by others (either on the mailing list or on GitHub). But I don’t have much to offer in this regard, especially because I cannot reproduce the issue. The biggest difference in this regard between 1.0 and 2.0 is that 2.0 writes temporary files to ~/.sequenceserver whereas 1.0 did all that in /tmp. We made this change to be able to keep the BLAST jobs around for a while. So if I had to investigate this I would try to understand how does Docker manage /tmp and /home or /root. In a non-Docker context, sometimes /tmp is RAM based which make it much faster. In context of Docker, their container-filesystem could be slower than usual. I would look what container filesystem is being used on your setup, what the different options are, are there known problems with some that can explain this?

A pull-request documenting the API would be much appreciated. If you would like to correct the Content-Type response header of jobid.json, that would be much appreciated as well. If not, I would be happy to take care of that - much appreciate you reporting it.
Note that documentation work for 2.0 is happening in gh-pages2.0 branch.

There is no API at the moment to report list of running jobs. Many aspects of the backend framework are there though. See job.rb and pool.rb. Job doesn’t have a running status though; only whether it is completed or not. You can consider adding a running status or just subtract list of pending jobs (@queue in pool.rb) from all non-completed jobs (Job.all.reject(&:done?))

Regarding throttling: There’s an inbuilt thread pool and only the allowed number of jobs will be run concurrently, rest are queued automatically by SequenceServer.

To implement restart that waits for jobs to finish should just require listening to a restart signal, and calling ’shutdown’ method on job pool object (see ’shutdown’ method in pool.rb).

Priyam

Richard_Adams · June 27, 2020, 9:47am

Thanks for the fulsome reply, I’ll try to get round to this in the next week or so.