fasta36 bug report

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

fasta36 bug report

Antony03
Hello,

There is a problem when I try to parse a fasta36 report. I got this error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Unrecognized alignment line (3) '>--'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::SearchIO::fasta::next_result /usr/share/perl5/Bio/SearchIO/fasta.pm:1148
STACK: ./Auto_Annot.pl:123
-----------------------------------------------------------

There is a '>--' after each alignment in fasta36 but not in fasta35.

Consequently, I tried to parse a fasta35 alignment. There is no problem with bioperl. However, the result is clearly not the same between both (fasta35 and fasta36).

fasta35 gives only 1 alignment: http://pastebin.com/f4NdYJCt

while fasta36 gives 3 alignments: http://pastebin.com/ADeKJ4GC

What I'm doing wrong with fasta35? It is probably not normal that it misses almost 2 perfect alignments on 3.

Thanks you!
Reply | Threaded
Open this post in threaded view
|

Re: fasta36 bug report

Fields, Christopher J
Looks as if there have been major changes to fasta36, see here:


Note in particular point 2:

  • Display of all significant alignments between query and library sequence. BLAST has always displayed multiple high-scoring alignments (HSPs) between the query and library sequence; previous versions of the FASTA programs displayed only the best alignment, even when other high-scoring alignments were present. This is the major change in FASTA36. For most programs (fasta36ssearch36[t]fast[xy]36), if the library sequence contains additional significant alignments, they will be displayed with the alignment output, and as part of -m 9 output (the initial list of high scores).

    By default, the statistical threshold for alternate alignments (HSPs) is the E()-threshold / 10.0. For proteins, the default expect threshold is E()< 10.0, the secondary threshold for showing alternate alignments is thus E() < 1.0. Fror translated comparisons, the E()-thresholds are 5.0/0.5; for DNA:DNA 2.0/0.2.

    Both the primary and secondary E()-thresholds are set with the -E "prim sec" command line option. If the secondary value is betwee zero and 1.0, it is taken as the actual threshold. If it is > 1.0, it is taken as a divisor for the primary threshold. If it is negative, alternative alignments are disabled and only the best alignment is shown.

I suggest submitting this as a bug report to GitHub along with the examples (use a gist, not pastebin as the latter are not around forever).  We could add support for this (it’s not very high on our priority list of fixes), but having example files helps tremendously along that path.  We also will acept any help/patches that get this working again :)

chris

On Aug 7, 2014, at 7:47 PM, Antony03 <[hidden email]> wrote:

Hello,

There is a problem when I try to parse a fasta36 report. I got this error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Unrecognized alignment line (3) '>--'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::SearchIO::fasta::next_result
/usr/share/perl5/Bio/SearchIO/fasta.pm:1148
STACK: ./Auto_Annot.pl:123
-----------------------------------------------------------

There is a '>--' after each alignment in fasta36 but not in fasta35.

Consequently, I tried to parse a fasta35 alignment. There is no problem with
bioperl. However, the result is clearly not the same between both (fasta35
and fasta36).

fasta35 gives only 1 alignment: http://pastebin.com/f4NdYJCt

while fasta36 gives 3 alignments: http://pastebin.com/ADeKJ4GC

What I'm doing wrong with fasta35? It is probably not normal that it misses
almost 2 perfect alignments on 3.

Thanks you!




--
View this message in context: http://bioperl.996286.n3.nabble.com/fasta36-bug-report-tp17620.html
Sent from the Bioperl-L mailing list archive at Nabble.com.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: fasta36 bug report

Antony03
Thank you for the answer. Do you if I can get more significant results with fasta35?
Reply | Threaded
Open this post in threaded view
|

Re: fasta36 bug report

Fields, Christopher J
Depends on what you mean by ‘significant’; this is a question that probably belongs on a forum like Biostar or seqanswers.

The main thing is that the output is different enough that the latest releases will require additional work to get parsing working, but in the end may likely be more informative.  On the other hand if you have highly repetitive data then it could generate a lot of noise.  fasta35 may be completely sufficient depending what you are trying to do (which isn’t obvious here).

chris

On Aug 7, 2014, at 9:29 PM, Antony03 <[hidden email]> wrote:

> Thank you for the answer. Do you if I can get more significant results with
> fasta35?
>
>
>
> --
> View this message in context: http://bioperl.996286.n3.nabble.com/fasta36-bug-report-tp17620p17622.html
> Sent from the Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l