Link creator very slow

Hi!

I’m using Sequenceserver 0.8.6, and everything works fine so far. I have a db from a normal fasta file that I blast against, and it works if there are only a few hits. However if there are many hits (like 200 or so), the BLAST itself is fast (5 senconds) but then the “hyperlink creator” takes forever, like 1-3 seconds per link. This makes Sequenceserver too slow to use. It does not make a difference if I use -parse_seqids when making the database or not. I would like to have the sequence links in the results if possible.
Any way to speed this up?

Cheers,
Till

Hi Till,

Thanks for your interest in seqserv. We are aware of this problem and are currently working to fix it. Are you observing that it 0.8.6 is slower than previous versions?
ben

Hi Ben,

no, I don’t remember that being a problem in older versions, but I never tried it specifically so I can’t really say.

Cheers,

Till

OK, thanks. Interesting.

I also have noticed this performance issue. For what it’s worth, the delay seem to be a function not of the links but of the length and number of alignments that SequenceServer is trying to display. Long alignments take several seconds each, and if there are 100 of them requested (as is default) then it is a long wait (and possibly times out).

One workaround is to set
-num_alignments 3
or some other small number. Then the results come back quickly, even if there are 100s of hyperlinked results.

Thanks,
Owen

Hi guys,

Thanks for your patience on this. I believe I’ve fixed this problem. The cause was this:

For each line of the blast output, seqserv was parsing the string, and then using string concatenation (i.e. output += now_formatted_line) to build up the HTML that eventually gets displayed in the output. However, when the number of lines in the output is large (e.g. my test blasting a 16S sequence against greengenes gave upwards of 26k lines of output), this gets slow. You might have noticed that it actually gets slower - the "DEBUG – : Added link for:… " logging lines get printed more and more slowly.

My solution was to record each parsed line in a new array, and then call Array#join at the end once. This probably isn’t the fastest solution (maybe modifying the input array in-place when necessary might be faster, and not being as verbose with the logging is probably also a good idea), but on my test set it seems much improved.

I’m hoping you two (and anyone else of course), might be able to confirm this for me, before I push out a new rubygem. You can get the code using git

$ git clone -b faster_parsing https://github.com/wwood/sequenceserver

Thanks for being good citizens. As usual, this optimisation showed me my preconceived thoughts on why the code was slow was way off…
ben

I’d be more than happy to test this, but - is there any way you could make a beta gem for us to install to get around us having to do the compilation? It would probably be a better deployment test anyway, right? :smiley:

Hi,

OK, I’ve released a new gem sequenceserver-beta with these new changes in them. So to install

$ gem uninstall sequenceserver

$ gem install sequenceserver-beta

For the sake of keeping development simple, I don’t want to make any guarantees about what will happen to this gem in the future, so even if you wish to stay on the bleeding edge this gem may not always contain the newest code. The gem is just a way of distributing the code in situations like this.

ben

Tried installing the beta, and I got “ERROR: Could not find a valid gem ‘sequenceserver-beta’ (>0) in any repository”

OK, sorry, my mistake. Try this

gem install sequenceserver --pre

You may as well uninstall the sequenceserver-beta gem.

Ummm…Ben? Has anyone told you that YOU ARE A GOD? :slight_smile:

I ran some simple tests - I ran a COX1 protein sequence fragment against the non redundant protein database…here are my results:

MFADRWLFSTNHKDIGtLyLLFGAWAGVLGtALsLLIRAELGQPGNLLGNDHIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSLLLLLASAMVEAGAGTGWTVYPPLAGNYSHPGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMTQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPtGVKVFSWLATLHGsNMKWSAAVLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFIHWFPLFSGYTLDQTYAKIHFTIMFIGVNLTFFPQHFLGLSGMPRRYSDYPDAYTTWNILSSVGSFISLTAVMLMIFMIWEAFASKRKVLMVEEPSMNLEWLYGCPPPYHTFEEPVYMKS-

BEFORE (production version of SequenceServer):

Blastp:  2.2 minutes to execute (AMD FX8350 8-core cpu)
SequenceServer:  2.5 minutes to parse results

AFTER (beta version of SequenceServer):

Hah, I’m not sure about that, I’m just following what ruby-prof tells me.

Relatedly, do you notice a speedup with the newly released blast 2.2.29+ using >1 thread? In the release notes it suggested something about better using multiple threads that I couldn’t quite follow, but have you noticed that it is better in practice?

I haven’t upgraded to the newer blast yet, but I’m eager to try it out when I get a chance. I’ve noticed that with 2.2.26 blastp is pretty good about using multiple threads, but blastn isn’t (e.g. 4 processors used vs 1). I have to admit that previously I’ve just installed blast through aptitude, so this will be my first time compiling it from source. Is there a tweak to make sure it compiles for multiple cores?

And - just got blast+ 2.2.29 installed and did some quick tests. It doesn’t appear to be any faster than 2.2.26; same basic run profile - blastp seems very good about multithreading, blastn not so much. Not sure why there would be such a disparity….

OK, that’s disappointing. Thanks anyway.

I’ll release a new proper version of seqserv in a few days in the absence of bug reports.

OK the beta version of the gem is now in the mainline and released in version 0.8.7.

So:

$ gem update sequenceserver

and you should be set.

ben