Sequence retrieval issue

Hi there!

Can i get some more info on this error please?

ERROR: incorrect number of sequences found.

Dear user,

We have found less sequence than expected.

This is likely due to a problem with how databases are formatted. Please share this text with the person managing this website so they can resolve the issue.

You requested 1 sequence with the following identifiers: 512555, from the following databases: /usr/******************/Illumina_only_AbySS-contigs.fa. But we found 0 sequence.

If sequences were retrieved, you can find them below (but some may be incorrect, so be careful!).

I presume it’s something to do with what the fasta file looks like when formatting to blast database.

So here is a snip out of the fasta file:

290698 57 1241
ACGTCTCGGCCTAGACGTATACTATATAAGCTCTATTCTACGGTTTAAGTCCCGTCG
290699 29 4599
TAACTTATATCTCTTATCTCTACCTCTCG
290701 45 1075
TAAGGAACCGACTAGTAGACTTTACTGCCGCTATAATAGGTACTA
290702 55 202045
TTTAAATTTTTAATATTTTACAAGAAGGTAATCTGTTCTTCTTTTGGCACTGTAT

Can i get some more info on this error please?

Let me profess my remarks by saying that sequence retrieval in SS is
flaky. The sequence id that needs to be retrieved is generated by
parsing the BLAST output, which differs not only from version to
version of BLAST+, but also on the FASTA header (if the sequence id
start with a 'gi', for example), and the way databases are formatted
(parse_seqids or not, etc). Sequence retrieval will be more robust (read,
almost zero likelihood of error) by 1.0.

To the issue at hand now.

[...]

You requested 1 sequence with the following identifiers: 512555, from the
following databases: /usr/******************/Illumina_only_AbySS-contigs.fa.
But we found 0 sequence.

From inspecting the BLAST output, SS thinks that the id of the hit you

are trying to retrieve is 512555 and asks `blastdbcmd` to fetch it
from the database, but `blastdbcmd` couldn't.

What is your BLAST+ version? What output do the following commands yield:

$ grep '512555' Illumina_only_AbySS-contigs.fa
$ grep '>512555' Illumina_only_AbySS-contigs.fa
$ blastdbcmd -db Illumina_only_AbySS-contigs.fa -entry 512555

Sorry for the late reply!

Using blast-2.2.26+

grep ‘512555’ Illumina_only_AbySS-contigs.fa

512555 13389 342680 497238+,…,510027-

$ grep ‘>512555’ Illumina_only_AbySS-contigs.fa

512555 13389 342680 497238+,…,510027-

blastdbcmd -db Illumina_only_AbySS-contigs.fa -entry 512555
Error: GI 512555: OID not found

So, the sequence is there. Don’t know if the headers are just wonky and blast is unable to extract the ID?

[...]

No worries. Thanks to Ben, we now know exactly what the issue is:
https://github.com/yannickwurm/sequenceserver/issues/88

Hi Frederick,

we hope to fix this in a future version - as Anurag mentioned it has to do with the way BLAST interprets numbers in the fasta identifier lines.
I can think of two likely workarounds that should work in the mean time.

  1. Prepend some letters to each of the sequence identifiers. For example you can do that with the follwoing command:
    ruby -pe ‘gsub(">", “>abyssContig”)’ < Illumina_only_AbySS-contigs.fa > Illumina_only_AbySS-contigs.renamed.fa
  2. Replace the spaces ’ ’ with underscores ‘
    ruby -pe 'gsub(" ", "
    ")’ < Illumina_only_AbySS-contigs.fa > Illumina_only_AbySS-contigs.renamed.fa

With kind regards,

Yannick

Hi Yannick.

Thanks for the tip. A useful code snippet. I would have done it in python or sed.
Had no need to really fiddle with the abyss assemblies for a while but cegma also does not like the format of the headers so I might just fix them.

All the best!