I have deployed sequenceserver 1.0.12 for test on a virtual machine (2 CPU, 8 GB), and I have made some benchmarks with a 11 peptide sequences fasta query, a 31300 peptide sequences database and with the debug mode.
Here are the time marks:
53s (50%) : blastp starts and query file appears in tmpdir
1min 17s (75%) : blast_formatter starts
1min 19s (75%) : query file disappears in tmpdir
1min 47s (100%) : Total time. End of job. The results are displayed on the web interface
It seems that the pure blast tasks (blastp and blast_formatter) take only 25% of the total time, what puts the total time to a big 1min 47s, whereas the same query runned on the same database with the legacy wwwblast takes only around 31s to display the results.
I cannot figure out what tasks are runned before the blast tasks start (with a non-negligible 50% of the total time), and, to a lesser extent, after it is finished.
Is it normal ? Is is possible to optimize these “pre-” and “post-blast” tasks ? Could you please explain me what the code is supposed to do ? There is maybe a bottleneck on my machine…
Thanks in advance for your help.
There isn’t much that is done before BLAST. From the moment you click on the ‘BLAST’ button these are the things that happen:
- The query sequence is uploaded to a temporary file.
- Two empty temporary files are created to record BLAST’s stdout and stderr.
- BLAST is run
The cost of second step is negligible. First step may take some time depending on size of query sequence, network, etc. But for 11 peptide sequences, it should be negligible. So what you are seeing is an anomaly.
- the XML output is parsed into memory
- HTML output is created from the parsed data and sent back to browser
I wouldn’t be surprised if our XML to HTML conversion is slower than wwwblast. However, 28 seconds to parse XML output of 11 peptide sequences, again, seems like an anomaly. For example, it takes barely 10 seconds to parse a 96 Mb XML output containing 329 hits and 32,697 hsps on my Mac (2.7 Ghz, 8 GB) with a peak memory usage of 350 Mb (XML parsing is single-threaded so number of CPUs don’t matter).
Not sure if your virtual machines are VirtualBox based, but in multiple different contexts, I have noticed that programs can slow down considerably inside VirtualBox. Performance of Ruby language itself can be a factor. I used Ruby 2.3 to report the above number.
I hope this helps.
It seems that the considerable amount of time that passed before the blast tasks was due to the lack of internet access, I am not sure why. When I did these tests, I didn’t have access to the Internet.
Now that I have access to the internet again, the blast tasks start immediately and the total time (51s) is much shorter, but still longer than wwwblast (because of the formatting of the results for display ?).