length of hit->description

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

length of hit->description

Antony03
Hello,

When I use hit->description I get a truncated result like this:

repB putative plasmid replication protein R

I tried hit->description(500) and I get the same result.

Is there something wrong?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Fields, Christopher J
You’ll have to be more explicit.  What is the analysis (BLAST, FASTA, BLAT, etc)?  What is the expected result?  

Most of the above parsers will take the hit name as the first set of non-whitespace characters, and anything after that is put into the description.

chris

On Jul 24, 2014, at 12:04 PM, Antony03 <[hidden email]> wrote:

> Hello,
>
> When I use hit->description I get a truncated result like this:
>
> repB putative plasmid replication protein R
>
> I tried hit->description(500) and I get the same result.
>
> Is there something wrong?
>
> Thanks
>
>
>
> --
> View this message in context: http://bioperl.996286.n3.nabble.com/length-of-hit-description-tp17596.html
> Sent from the Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Fields, Christopher J
That is unusual.  The only recourse is to submit a bug report to Github and attach an example report giving the error.  My feeling is that this might be a change in the output for FASTA.

chris

On Jul 24, 2014, at 12:54 PM, Antony Vincent <[hidden email]> wrote:

> Hello,
>
> Thank you for the quick answer.
>
> Here is a part of the code:
> #fasta parameters
> my $fh;  
> my $query = 'MyORF.tfa';
> my $library = "$library";                
> my $fasta   = 'fasta35';
>
>
> # The fasta parsing part
> my $command = "$fasta -b 1 $query $library";
>
> open $fh,"$command |" || die("cannot run fasta cmd of $command: $!\n");
>
> my $searchio  = Bio::SearchIO->new(-format => 'fasta', -fh => $fh);
> while( my $result = $searchio->next_result ) {
>  ## $result is a Bio::Search::Result::ResultI compliant object
>  while( my $hit = $result->next_hit ) {
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>    while( my $hsp = $hit->next_hsp ) {
>      ## $hsp is a Bio::Search::HSP::HSPI compliant object
>      if( $hsp->length('total') > $length*0.85) {
>        if ( $hsp->frac_conserved >= 0.65) {
>
>
> print $hit->description(500), "\n";
>
>
>
> }
>            }
> }
> }  
> }
>
> The correct result should be :
> repB putative plasmid replication protein RepB
>
> And I get:
> repB putative plasmid replication protein R
>
> Thank you,
>
> Antony
>
> Le 2014-07-24 à 13:48, Fields, Christopher J <[hidden email]> a écrit :
>
>> You’ll have to be more explicit.  What is the analysis (BLAST, FASTA, BLAT, etc)?  What is the expected result?  
>>
>> Most of the above parsers will take the hit name as the first set of non-whitespace characters, and anything after that is put into the description.
>>
>> chris
>>
>> On Jul 24, 2014, at 12:04 PM, Antony03 <[hidden email]> wrote:
>>
>>> Hello,
>>>
>>> When I use hit->description I get a truncated result like this:
>>>
>>> repB putative plasmid replication protein R
>>>
>>> I tried hit->description(500) and I get the same result.
>>>
>>> Is there something wrong?
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context: http://bioperl.996286.n3.nabble.com/length-of-hit-description-tp17596.html
>>> Sent from the Bioperl-L mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Francisco J. Ossandón
In reply to this post by Antony03
Hello,
Can you give more context of the problem?? Are you talking about a Blast
result or a HMMer result??

Is important because the default Blast output format use multiple lines if
hit description is too long, but HMMer truncates the description to what can
be fit in a single line, so BioPerl can't recover more than what appears in
the output. If your are talking about HMMer, some months ago I already
applied a patch that recovers as most as possible from the description:
https://github.com/bioperl/bioperl-live/commit/74c88d254215dd1879e23488cd308
033d0fce6f9

Also, what's your purpose with " hit->description(500)"?? Because passing an
argument would change the description to '500' instead of the original
description...

Cheers,

Francisco J. Ossandon

-----Mensaje original-----
De: bioperl-l-bounces+fossandonc=[hidden email]
[mailto:bioperl-l-bounces+fossandonc=[hidden email]] En
nombre de Antony03
Enviado el: jueves, 24 de julio de 2014 13:05
Para: [hidden email]
Asunto: [Bioperl-l] length of hit->description

Hello,

When I use hit->description I get a truncated result like this:

repB putative plasmid replication protein R

I tried hit->description(500) and I get the same result.

Is there something wrong?

Thanks



--
View this message in context:
http://bioperl.996286.n3.nabble.com/length-of-hit-description-tp17596.html
Sent from the Bioperl-L mailing list archive at Nabble.com.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Antony03
This post was updated on .
Here is a part of the code:
#fasta parameters
my $fh;  
my $query = 'MyORF.tfa';
my $library = "$library";                
my $fasta   = 'fasta35';


# The fasta parsing part
my $command = "$fasta -b 1 $query $library";

open $fh,"$command |" || die("cannot run fasta cmd of $command: $!\n");

my $searchio  = Bio::SearchIO->new(-format => 'fasta', -fh => $fh);
while( my $result = $searchio->next_result ) {
 ## $result is a Bio::Search::Result::ResultI compliant object
 while( my $hit = $result->next_hit ) {
   ## $hit is a Bio::Search::Hit::HitI compliant object
   while( my $hsp = $hit->next_hsp ) {
     ## $hsp is a Bio::Search::HSP::HSPI compliant object
     if( $hsp->length('total') > $length*0.85) {
       if ( $hsp->frac_conserved >= 0.65) {

       
        print $hit->description(500), "\n";
       


                                }
           }
                }
        }  
}

The correct result should be :
repB putative plasmid replication protein RepB

And I get:
repB putative plasmid replication protein R

In fact, for the '500' it is because of this:

 Usage     : $hit_object->description( [integer] );
 Purpose   : Set/Get a description string for the hit.
             This is parsed out of the "Query=" line as everything after
             the first chunk of non-whitespace text. Use $hit->name()
             to get the first chunk (the ID of the sequence).
 Example   : $description = $hit->description;
           : $desc_60char = $hit->description(60);
 Argument  : Integer (optional) indicating the desired length of the
           : description string to be returned.
 Returns   : String consisting of the hit's description or undef if not set.

That you can found here: http://doc.bioperl.org/releases/bioperl-1.2/Bio/Search/Hit/BlastHit.html

But maybe it is just for blast and not fasta.
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Antony03
In fact, for the '500' it is because of this:

 Usage     : $hit_object->description( [integer] );
 Purpose   : Set/Get a description string for the hit.
             This is parsed out of the "Query=" line as everything after
             the first chunk of non-whitespace text. Use $hit->name()
             to get the first chunk (the ID of the sequence).
 Example   : $description = $hit->description;
           : $desc_60char = $hit->description(60);
 Argument  : Integer (optional) indicating the desired length of the
           : description string to be returned.
 Returns   : String consisting of the hit's description or undef if not set.

That you can found here:
http://doc.bioperl.org/releases/bioperl-1.2/Bio/Search/Hit/BlastHit.html

But maybe it is just for blast and not fasta.



--
View this message in context: http://bioperl.996286.n3.nabble.com/length-of-hit-description-tp17596p17601.html
Sent from the Bioperl-L mailing list archive at Nabble.com.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Torsten Seemann
In reply to this post by Antony03
my $fasta   = 'fasta35';

Firsly, I would recommend upgrading to fasta36 which was released in 2011 I think. It's focussed on batch and command line usage rather than interactive usage.
 
my $command = "$fasta -b 1 $query $library";

By default fasta36 truncates the descriptions (and I think fasta35 did too), so I think you need to add "-L" to the options ("long library descriptions").

--
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Francisco J. Ossandón
In reply to this post by Antony03
For what I can see in that page, is the documentation of version 1.2, which
is really really old. Current BlasttHit module does not contain that
subroutine anymore
(https://github.com/bioperl/bioperl-live/blob/master/Bio/Search/Hit/BlastHit
.pm), which means that probably the "description" subroutine used is the one
from GenericHit module
(https://github.com/bioperl/bioperl-live/blob/master/Bio/Search/Hit/GenericH
it.pm):
=head2 description
Title : description
Usage : $desc = $hit->description();
Function: Retrieve the description for the hit
Returns : a scalar string
Args : [optional] scalar string to set the descrition
=cut

So if you are using an updated version, "->description(500)" is replacing
the description instead of doing what you expect. What BioPerl version are
you using??


-----Mensaje original-----
De: bioperl-l-bounces+fossandonc=[hidden email]
[mailto:bioperl-l-bounces+fossandonc=[hidden email]] En
nombre de Antony03
Enviado el: jueves, 24 de julio de 2014 15:08
Para: [hidden email]
Asunto: Re: [Bioperl-l] length of hit->description

In fact, for the '500' it is because of this:

 Usage     : $hit_object->description( [integer] );
 Purpose   : Set/Get a description string for the hit.
             This is parsed out of the "Query=" line as everything after
             the first chunk of non-whitespace text. Use $hit->name()
             to get the first chunk (the ID of the sequence).
 Example   : $description = $hit->description;
           : $desc_60char = $hit->description(60);  Argument  : Integer
(optional) indicating the desired length of the
           : description string to be returned.
 Returns   : String consisting of the hit's description or undef if not set.

That you can found here:
http://doc.bioperl.org/releases/bioperl-1.2/Bio/Search/Hit/BlastHit.html

But maybe it is just for blast and not fasta.



--
View this message in context:
http://bioperl.996286.n3.nabble.com/length-of-hit-description-tp17596p17601.
html
Sent from the Bioperl-L mailing list archive at Nabble.com.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Smithies, Russell
In reply to this post by Torsten Seemann

You beat me to it J

Even fasta36 truncates the description – which running fasta from the command-line would have shown you straight away.

Eg.

 

intrepid$ fasta36 -b1 MyORF.tfa  test.fna

# fasta36 -b1 MyORF.tfa test.fna

FASTA searches a protein or DNA sequence data bank

version 36.3.5c Dec, 2011(preload8)

Please cite:

W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

 

Query: MyORF.tfa

  1>>>test - 39 aa

Library: test.fna

       93 residues in     1 sequences

 

Statistics: (shuffled [495]) MLE statistics: Lambda= 0.1960;  K=0.09883

statistics sampled from 1 (1) to 495 sequences

Algorithm: FASTA (3.7 Nov 2010) [optimized]

Parameters: BL50 matrix (15:-5), open/ext: -10/-2

ktup: 1, E-join: 1 (1), E-opt: 0.2 (1), width:  16

Scan time:  0.040

 

The best scores are:                                      opt bits E(1)

gi|20800434|ref|NP_620820.1| putative DNA-binding  (  93)  193 58.0 1.2e-14

 

>>gi|20800434|ref|NP_620820.1| putative DNA-binding repl  (93 aa)

initn: 157 init1: 157 opt: 193  Z-score: 295.2  bits: 58.0 E(1): 1.2e-14

Smith-Waterman score: 193; 87.2% identity (87.2% similar) in 39 aa overlap (6-39:55-93)

 

                                        10        20             30

test                            MPAENQGMKYREIAEEMEISTGAVGRL-----KHD

                                     ::::::::::::::::::::::     :::

gi|208 RTIRNIVAESRDSYQARAAERRDTAVKLREQGMKYREIAEEMEISTGAVGRLLHDAKKHD

           30        40        50        60        70        80

 

 

 

--Russell

 

 

 

From: bioperl-l-bounces+russell.smithies=[hidden email] [mailto:bioperl-l-bounces+russell.smithies=[hidden email]] On Behalf Of Torsten Seemann
Sent: Friday, 25 July 2014 8:10 a.m.
To: Antony03
Cc: [hidden email]
Subject: Re: [Bioperl-l] length of hit->description

 

my $fasta   = 'fasta35';

 

Firsly, I would recommend upgrading to fasta36 which was released in 2011 I think. It's focussed on batch and command line usage rather than interactive usage.

 

my $command = "$fasta -b 1 $query $library";

 

By default fasta36 truncates the descriptions (and I think fasta35 did too), so I think you need to add "-L" to the options ("long library descriptions").

 

--
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: length of hit->description

Antony03
In fact, I tried with fasta36 but I got this error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Unrecognized alignment line (3) '>--'
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.16.2/Bio/Root/Root.pm:472
STACK: Bio::SearchIO::fasta::next_result /sw/lib/perl5/5.16.2/Bio/SearchIO/fasta.pm:1148
STACK: ./Auto_Annot.pl:83

The line 83 is this one of my previous code:

while( my $result = $searchio->next_result ) {

And I have no problem with fasta35.

Thank you the the -L ... I had not thought it could be the problem