Parsing "PCR_primers" tag from GenBank file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing "PCR_primers" tag from GenBank file

Horacio Montenegro
    hi,

    I am trying to parse a GenBank file to extract primers and other
info, outputting it separated with tabs. However, some records have
two "PCR_primers" tags, and they are not being separated. The
"satellite" tag also is doubled, but each one is being correctly
separated with tabs. How can I manage to output each primer pair
separated?

    thanks, Horacio

    One example of a record with two "PCR_primers" tags is Accession
GQ344853, GI 282937571. Bellow is the code snippet to reproduce the
behaviour:

#!/usr/bin/env perl
use Bio::DB::EUtilities;
use Bio::SeqIO;
my $seqio_object = Bio::SeqIO->new(-file => 'GQ344853.gb' );
while (my $seq = $seqio_object->next_seq) {
    print $seq->primary_id, "\t", $seq->length, "\t";
    for my $feat_object ($seq->get_SeqFeatures) {
        print $feat_object->get_tag_values("satellite"), "\t" if
($feat_object->has_tag("satellite"));
        print $feat_object->get_tag_values("PCR_primers"), "\t" if
($feat_object->has_tag('PCR_primers'));
    }
    print "\n";
}
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Parsing "PCR_primers" tag from GenBank file

Roy Chaudhuri-3
Hi Horacio,

The two "satellite" tags in GQ344853 are in different features, hence
they are separated out in your code, whereas the PCR_primers tags are
both in the same feature (source). get_tag_values (and
get_tagset_values, which is similar but doesn't throw an error if the
tag isn't found) return an array if there are several of the specified
tag in the feature, so you need to loop over that array if you want to
separate them out. Your original code just passed the array value
directly to print, so they were printed out one after the other.

Here's a modification of your code which should be closer to what you want:

#!/usr/bin/env perl
use strict;
use warnings FATAL=>qw(all);
use Bio::SeqIO;
my $seqio_object = Bio::SeqIO->new(-file => 'GQ344853.gb' );
while (my $seq = $seqio_object->next_seq) {
     print $seq->primary_id, "\t", $seq->length, "\n";
     for my $feat_object ($seq->get_SeqFeatures) {
         for my $tag (qw(satellite PCR_primers)) {
      for my $value ($feat_object->get_tagset_values($tag)) {
                  print "$tag\t$value\n";
             }
        }
     }
}

Cheers,
Roy.



On 14/06/2015 00:34, Horacio Montenegro wrote:

>      hi,
>
>      I am trying to parse a GenBank file to extract primers and other
> info, outputting it separated with tabs. However, some records have
> two "PCR_primers" tags, and they are not being separated. The
> "satellite" tag also is doubled, but each one is being correctly
> separated with tabs. How can I manage to output each primer pair
> separated?
>
>      thanks, Horacio
>
>      One example of a record with two "PCR_primers" tags is Accession
> GQ344853, GI 282937571. Bellow is the code snippet to reproduce the
> behaviour:
>
> #!/usr/bin/env perl
> use Bio::DB::EUtilities;
> use Bio::SeqIO;
> my $seqio_object = Bio::SeqIO->new(-file => 'GQ344853.gb' );
> while (my $seq = $seqio_object->next_seq) {
>      print $seq->primary_id, "\t", $seq->length, "\t";
>      for my $feat_object ($seq->get_SeqFeatures) {
>          print $feat_object->get_tag_values("satellite"), "\t" if
> ($feat_object->has_tag("satellite"));
>          print $feat_object->get_tag_values("PCR_primers"), "\t" if
> ($feat_object->has_tag('PCR_primers'));
>      }
>      print "\n";
> }
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Parsing "PCR_primers" tag from GenBank file

Horacio Montenegro
    Hi Roy,

    thanks, with your hints i was really able to retrieve the
information I want, the way I want. Sadly, also due to your hints, I
also discovered these records are so messy I will have to review them
all by hand anyway - better than proceeding with wrong data anyway.

    thanks agains, cheers, Horacio

On Sun, Jun 14, 2015 at 6:24 PM, Roy Chaudhuri <[hidden email]> wrote:

> Hi Horacio,
>
> The two "satellite" tags in GQ344853 are in different features, hence they
> are separated out in your code, whereas the PCR_primers tags are both in the
> same feature (source). get_tag_values (and get_tagset_values, which is
> similar but doesn't throw an error if the tag isn't found) return an array
> if there are several of the specified tag in the feature, so you need to
> loop over that array if you want to separate them out. Your original code
> just passed the array value directly to print, so they were printed out one
> after the other.
>
> Here's a modification of your code which should be closer to what you want:
>
> #!/usr/bin/env perl
> use strict;
> use warnings FATAL=>qw(all);
> use Bio::SeqIO;
> my $seqio_object = Bio::SeqIO->new(-file => 'GQ344853.gb' );
> while (my $seq = $seqio_object->next_seq) {
>     print $seq->primary_id, "\t", $seq->length, "\n";
>     for my $feat_object ($seq->get_SeqFeatures) {
>         for my $tag (qw(satellite PCR_primers)) {
>              for my $value ($feat_object->get_tagset_values($tag)) {
>                   print "$tag\t$value\n";
>              }
>         }
>     }
> }
>
> Cheers,
> Roy.
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l