Invalid EMBL files generated in rare circumstances; line wrapping

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Invalid EMBL files generated in rare circumstances; line wrapping

Adam Sjøgren-2
  Hi.

If you craft a tag on a feature sneakily (or if you are unlucky)
Bio::SeqIO will create invalid EMBL, separating the "/" from the
qualifier name:

    ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
    XX
    AC   unknown;
    XX
    XX
    XX
    FH   Key             Location/Qualifiers
    FH
    FT   CDS             1..4
    FT                   /
    FT                   note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    FT                   X"
    XX
    SQ   Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
         actg                                                                      4
    //

In this example "/" and "note" are on separate lines, which is wrong; at
least BioPerl does not accept it itself.

Here is a script to create the above output (BioPerl 1.6.901 used):

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Bio::Seq::RichSeq;
    use Bio::SeqFeature::Generic;
    use IO::String;
    use Bio::SeqIO;

    my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
    my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
    $cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
    $seq->add_SeqFeature($cds);

    my $string;
    my $str=IO::String->new($string);
    my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
    $io->write_seq($seq);
    print $string;

Changing the position of the space in the note makes a/the difference.

Maybe there is a bug lurking in the line wrapping/formatting code
somewhere...

Does this sound like a bug to anyone else?

  Best regards,

    Adam

--
                                                          Adam Sjøgren
                                                    [hidden email]

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Invalid EMBL files generated in rare circumstances; line wrapping

Fields, Christopher J
I can reproduce that on master branch.  It’s a weird consequence/side-effect of the text wrapping I think; if you remove the space at the end of the string of X’s and allow the module to text wrap the line it works fine.  I don’t think we’ve ever run into it frankly.  

If possible can you file it as a bug on GitHub?

chris

On Sep 29, 2014, at 10:17 AM, Adam Sjøgren <[hidden email]> wrote:

>  Hi.
>
> If you craft a tag on a feature sneakily (or if you are unlucky)
> Bio::SeqIO will create invalid EMBL, separating the "/" from the
> qualifier name:
>
>    ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
>    XX
>    AC   unknown;
>    XX
>    XX
>    XX
>    FH   Key             Location/Qualifiers
>    FH
>    FT   CDS             1..4
>    FT                   /
>    FT                   note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    FT                   X"
>    XX
>    SQ   Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
>         actg                                                                      4
>    //
>
> In this example "/" and "note" are on separate lines, which is wrong; at
> least BioPerl does not accept it itself.
>
> Here is a script to create the above output (BioPerl 1.6.901 used):
>
>    #!/usr/bin/perl
>
>    use strict;
>    use warnings;
>
>    use Bio::Seq::RichSeq;
>    use Bio::SeqFeature::Generic;
>    use IO::String;
>    use Bio::SeqIO;
>
>    my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
>    my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
>    $cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
>    $seq->add_SeqFeature($cds);
>
>    my $string;
>    my $str=IO::String->new($string);
>    my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
>    $io->write_seq($seq);
>    print $string;
>
> Changing the position of the space in the note makes a/the difference.
>
> Maybe there is a bug lurking in the line wrapping/formatting code
> somewhere...
>
> Does this sound like a bug to anyone else?
>
>  Best regards,
>
>    Adam
>
> --
>                                                          Adam Sjøgren
>                                                    [hidden email]
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Invalid EMBL files generated in rare circumstances; line wrapping

Adam Sjøgren
"Fields, Christopher J" <[hidden email]> writes:

> I can reproduce that on master branch. It’s a weird
> consequence/side-effect of the text wrapping I think; if you remove
> the space at the end of the string of X’s and allow the module to text
> wrap the line it works fine. I don’t think we’ve ever run into it
> frankly.

Yes, it looks like a corner case that I was "unlucky" enough to hit.

> If possible can you file it as a bug on GitHub?

Certainly: https://github.com/bioperl/bioperl-live/issues/84


  Best regards,

    Adam

--
 "The key to performance is elegance, not battalions          Adam Sjøgren
  of special cases."                                     [hidden email]

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Loading...