Link creator very slow

Till · December 4, 2013, 11:13am

Hi!

I’m using Sequenceserver 0.8.6, and everything works fine so far. I have a db from a normal fasta file that I blast against, and it works if there are only a few hits. However if there are many hits (like 200 or so), the BLAST itself is fast (5 senconds) but then the “hyperlink creator” takes forever, like 1-3 seconds per link. This makes Sequenceserver too slow to use. It does not make a difference if I use -parse_seqids when making the database or not. I would like to have the sequence links in the results if possible.
Any way to speed this up?

Cheers,
Till

Ben_Woodcroft · December 4, 2013, 11:38am

Hi Till,

Thanks for your interest in seqserv. We are aware of this problem and are currently working to fix it. Are you observing that it 0.8.6 is slower than previous versions?
ben

Till · December 4, 2013, 12:44pm

Hi Ben,

no, I don’t remember that being a problem in older versions, but I never tried it specifically so I can’t really say.

Cheers,

Till

Ben_Woodcroft · December 4, 2013, 2:16pm

OK, thanks. Interesting.

Owen · January 14, 2014, 12:07am

I also have noticed this performance issue. For what it’s worth, the delay seem to be a function not of the links but of the length and number of alignments that SequenceServer is trying to display. Long alignments take several seconds each, and if there are 100 of them requested (as is default) then it is a long wait (and possibly times out).

One workaround is to set
-num_alignments 3
or some other small number. Then the results come back quickly, even if there are 100s of hyperlinked results.

Thanks,
Owen

Ben_Woodcroft · January 19, 2014, 6:28am

Hi guys,

Thanks for your patience on this. I believe I’ve fixed this problem. The cause was this:

For each line of the blast output, seqserv was parsing the string, and then using string concatenation (i.e. output += now_formatted_line) to build up the HTML that eventually gets displayed in the output. However, when the number of lines in the output is large (e.g. my test blasting a 16S sequence against greengenes gave upwards of 26k lines of output), this gets slow. You might have noticed that it actually gets slower - the "DEBUG – : Added link for:… " logging lines get printed more and more slowly.

My solution was to record each parsed line in a new array, and then call Array#join at the end once. This probably isn’t the fastest solution (maybe modifying the input array in-place when necessary might be faster, and not being as verbose with the logging is probably also a good idea), but on my test set it seems much improved.

I’m hoping you two (and anyone else of course), might be able to confirm this for me, before I push out a new rubygem. You can get the code using git

$ git clone -b faster_parsing https://github.com/wwood/sequenceserver

Thanks for being good citizens. As usual, this optimisation showed me my preconceived thoughts on why the code was slow was way off…
ben

Wolfgang_Rumpf · January 20, 2014, 4:25pm

I’d be more than happy to test this, but - is there any way you could make a beta gem for us to install to get around us having to do the compilation? It would probably be a better deployment test anyway, right?

Ben_Woodcroft · January 21, 2014, 10:40am

Hi,

OK, I’ve released a new gem sequenceserver-beta with these new changes in them. So to install

$ gem uninstall sequenceserver

$ gem install sequenceserver-beta

For the sake of keeping development simple, I don’t want to make any guarantees about what will happen to this gem in the future, so even if you wish to stay on the bleeding edge this gem may not always contain the newest code. The gem is just a way of distributing the code in situations like this.

ben

Wolfgang_Rumpf · January 21, 2014, 2:07pm

Tried installing the beta, and I got “ERROR: Could not find a valid gem ‘sequenceserver-beta’ (>0) in any repository”

Ben_Woodcroft · January 21, 2014, 10:02pm

OK, sorry, my mistake. Try this

gem install sequenceserver --pre

You may as well uninstall the sequenceserver-beta gem.

Wolfgang_Rumpf · January 21, 2014, 11:52pm

Ummm…Ben? Has anyone told you that YOU ARE A GOD?

I ran some simple tests - I ran a COX1 protein sequence fragment against the non redundant protein database…here are my results:

MFADRWLFSTNHKDIGtLyLLFGAWAGVLGtALsLLIRAELGQPGNLLGNDHIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSLLLLLASAMVEAGAGTGWTVYPPLAGNYSHPGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMTQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPtGVKVFSWLATLHGsNMKWSAAVLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFIHWFPLFSGYTLDQTYAKIHFTIMFIGVNLTFFPQHFLGLSGMPRRYSDYPDAYTTWNILSSVGSFISLTAVMLMIFMIWEAFASKRKVLMVEEPSMNLEWLYGCPPPYHTFEEPVYMKS-

BEFORE (production version of SequenceServer):

Blastp:  2.2 minutes to execute (AMD FX8350 8-core cpu)

SequenceServer:  2.5 minutes to parse results

AFTER (beta version of SequenceServer):

Ben_Woodcroft · January 22, 2014, 1:45am

Hah, I’m not sure about that, I’m just following what ruby-prof tells me.

Relatedly, do you notice a speedup with the newly released blast 2.2.29+ using >1 thread? In the release notes it suggested something about better using multiple threads that I couldn’t quite follow, but have you noticed that it is better in practice?

Wolfgang_Rumpf · January 22, 2014, 3:26pm

I haven’t upgraded to the newer blast yet, but I’m eager to try it out when I get a chance. I’ve noticed that with 2.2.26 blastp is pretty good about using multiple threads, but blastn isn’t (e.g. 4 processors used vs 1). I have to admit that previously I’ve just installed blast through aptitude, so this will be my first time compiling it from source. Is there a tweak to make sure it compiles for multiple cores?

Wolfgang_Rumpf · January 22, 2014, 9:20pm

And - just got blast+ 2.2.29 installed and did some quick tests. It doesn’t appear to be any faster than 2.2.26; same basic run profile - blastp seems very good about multithreading, blastn not so much. Not sure why there would be such a disparity….

Ben_Woodcroft · January 22, 2014, 11:12pm

OK, that’s disappointing. Thanks anyway.

I’ll release a new proper version of seqserv in a few days in the absence of bug reports.

Ben_Woodcroft · January 25, 2014, 4:44am

OK the beta version of the gem is now in the mainline and released in version 0.8.7.

So:

$ gem update sequenceserver

and you should be set.

ben