using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Cook, Malcolm-2
Fellow long-time BioPerlers,

I am using Bio::SeqIO with success to convert between table (c.f.  http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/table.pm) and genbank flatfile format.

I have Bio::SeqIO sequence format conversion wrapped in a command-line script.  The script exposes to the command line the parameters to ->new for both input and output objects through judicious use of GetOptions.  I have used this script in many conversion tasks between many different formats.

... except now ...

I am having trouble with reading the flatfile format.

Happily, at first, I see that -display_id and -accession_number are both parameters to Bio::SeqIO::table->new.  So they are naturally exposed to the command line as `in format=table header=1 display_id=1 seq=3"

Alas however -description is not a parameter to ->new.

The only way I can see to configure table.pm to take the sequence description (aka desc) from the 2nd column of my .tab file is as follows:

        $in->attribute_map({-description => 2});

... however my trace shows me that even though this does work to set the desc attribute of the wrapped Bio::Primary_seq to the value from column 2, unfortunately using the attribute_map also removes the individual values passed in for -display_id and -accession_number

Ideally (I think) Bio::SeqIO::table->new  would take a -description=2 instead of having to call attribute_map.  

Or, Bio::SeqIO::table->new  would take  -attribute_map and even accept it as a string which gets evaluated to a hash reference, just as I see -colnames can be passed as a string evaling to an array (which I see in the unit test: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.924/t/SeqIO/table.t).  This would allow the hash to be supplied at the command line.

Or, am I missing something?

FWIW: I am trying to help a lab convert a few years of plasmids from DNAPlasmid to Genbank (for load into Vector NTI) and I am passing through Bio::SeqiO::table in-so-diong.....

Cheers, and Thanks for help and suggestions....

Malcolm Cook
Stowers Institute for Medical Research
1000 E 50th Street
Kanas City, MO 64110
(816) 926-4449
[hidden email]


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Brian Osborne-2
Malcolm,

Can you attach a file with “description” that I can use to test a fix?

Thanks again,

Brian O

> On Sep 11, 2015, at 10:16 PM, Cook, Malcolm <[hidden email]> wrote:
>
> Fellow long-time BioPerlers,
>
> I am using Bio::SeqIO with success to convert between table (c.f.  http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/table.pm) and genbank flatfile format.
>
> I have Bio::SeqIO sequence format conversion wrapped in a command-line script.  The script exposes to the command line the parameters to ->new for both input and output objects through judicious use of GetOptions.  I have used this script in many conversion tasks between many different formats.
>
> ... except now ...
>
> I am having trouble with reading the flatfile format.
>
> Happily, at first, I see that -display_id and -accession_number are both parameters to Bio::SeqIO::table->new.  So they are naturally exposed to the command line as `in format=table header=1 display_id=1 seq=3"
>
> Alas however -description is not a parameter to ->new.
>
> The only way I can see to configure table.pm to take the sequence description (aka desc) from the 2nd column of my .tab file is as follows:
>
> $in->attribute_map({-description => 2});
>
> ... however my trace shows me that even though this does work to set the desc attribute of the wrapped Bio::Primary_seq to the value from column 2, unfortunately using the attribute_map also removes the individual values passed in for -display_id and -accession_number
>
> Ideally (I think) Bio::SeqIO::table->new  would take a -description=2 instead of having to call attribute_map.  
>
> Or, Bio::SeqIO::table->new  would take  -attribute_map and even accept it as a string which gets evaluated to a hash reference, just as I see -colnames can be passed as a string evaling to an array (which I see in the unit test: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.924/t/SeqIO/table.t).  This would allow the hash to be supplied at the command line.
>
> Or, am I missing something?
>
> FWIW: I am trying to help a lab convert a few years of plasmids from DNAPlasmid to Genbank (for load into Vector NTI) and I am passing through Bio::SeqiO::table in-so-diong.....
>
> Cheers, and Thanks for help and suggestions....
>
> Malcolm Cook
> Stowers Institute for Medical Research
> 1000 E 50th Street
> Kanas City, MO 64110
> (816) 926-4449
> [hidden email]
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Fields, Christopher J
In reply to this post by Cook, Malcolm-2
Hi Malcolm,

Best thing would be to have a dummy example for expected input and output so it can be tested against, just to make sure things work as expected.  Could you supply that?  Certainly seems like it should be feasible.

chris

> On Sep 12, 2015, at 12:16 AM, Cook, Malcolm <[hidden email]> wrote:
>
> Fellow long-time BioPerlers,
>
> I am using Bio::SeqIO with success to convert between table (c.f.  http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/table.pm) and genbank flatfile format.
>
> I have Bio::SeqIO sequence format conversion wrapped in a command-line script.  The script exposes to the command line the parameters to ->new for both input and output objects through judicious use of GetOptions.  I have used this script in many conversion tasks between many different formats.
>
> ... except now ...
>
> I am having trouble with reading the flatfile format.
>
> Happily, at first, I see that -display_id and -accession_number are both parameters to Bio::SeqIO::table->new.  So they are naturally exposed to the command line as `in format=table header=1 display_id=1 seq=3"
>
> Alas however -description is not a parameter to ->new.
>
> The only way I can see to configure table.pm to take the sequence description (aka desc) from the 2nd column of my .tab file is as follows:
>
> $in->attribute_map({-description => 2});
>
> ... however my trace shows me that even though this does work to set the desc attribute of the wrapped Bio::Primary_seq to the value from column 2, unfortunately using the attribute_map also removes the individual values passed in for -display_id and -accession_number
>
> Ideally (I think) Bio::SeqIO::table->new  would take a -description=2 instead of having to call attribute_map.  
>
> Or, Bio::SeqIO::table->new  would take  -attribute_map and even accept it as a string which gets evaluated to a hash reference, just as I see -colnames can be passed as a string evaling to an array (which I see in the unit test: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.924/t/SeqIO/table.t).  This would allow the hash to be supplied at the command line.
>
> Or, am I missing something?
>
> FWIW: I am trying to help a lab convert a few years of plasmids from DNAPlasmid to Genbank (for load into Vector NTI) and I am passing through Bio::SeqiO::table in-so-diong.....
>
> Cheers, and Thanks for help and suggestions....
>
> Malcolm Cook
> Stowers Institute for Medical Research
> 1000 E 50th Street
> Kanas City, MO 64110
> (816) 926-4449
> [hidden email]
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Cook, Malcolm-2
Hi Chris, Brian, Hillmar, et. al.,

Thanks for offering to consider this change.

Attached is a test.tab and converted test.tab.gb

test.tab has three columns, n (display_id) d (definition/description) s (sequence)

test.tab.gb has what I would hope would result from writing in genbank format after reading using:

        Bio::SeqIO->new(-file => $filename, -format => 'table'. -header=1, -display_id=1 ,-accession_number=1, -seq=3, -desc=2)


You may be additionally interested in the following:  
After preparing this data, I tried to round-trip it, and found the following error when trying to convert test.tab.gb back to table format:

perl -M'Bio::SeqIO'  -e '$out = Bio::SeqIO->new(-format => qq{table}); $in = Bio::SeqIO->new(-format => qq{genbank},-file=>"test.tab.gb");  while ( my $seq = $in->next_seq() ) {$out->write_seq($seq) }'  > test.tab.gb.tab

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Sorry, you cannot write to a generic Bio::SeqIO object.
STACK: Error::throw
STACK: Bio::Root::Root::throw /n/local/stage/perlbrew/perlbrew-0.43/perls/perl-5.16.1t/lib/site_perl/5.16.1/Bio/Root/Root.pm:486
STACK: Bio::SeqIO::write_seq /n/local/stage/perlbrew/perlbrew-0.43/perls/perl-5.16.1t/lib/site_perl/5.16.1/Bio/SeqIO.pm:540
STACK: -e:1

Any help much appreciated.  I do have a workaround for now, but it is a kludge....

Cheers,

Malcolm

 > -----Original Message-----
 > From: Fields, Christopher J [mailto:[hidden email]]
 > Sent: Saturday, September 12, 2015 11:12 PM
 > To: Cook, Malcolm <[hidden email]>
 > Cc: [hidden email]; Hilmar Lapp <[hidden email]>
 > Subject: Re: [Bioperl-l] using Bio::SeqIO to convert from table to genbank
 > format ..... attribute_map example
 >
 > Hi Malcolm,
 >
 > Best thing would be to have a dummy example for expected input and output
 > so it can be tested against, just to make sure things work as expected.  Could
 > you supply that?  Certainly seems like it should be feasible.
 >
 > chris
 >
 > > On Sep 12, 2015, at 12:16 AM, Cook, Malcolm <[hidden email]> wrote:
 > >
 > > Fellow long-time BioPerlers,
 > >
 > > I am using Bio::SeqIO with success to convert between table (c.f.
 > http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/table.pm) and genbank
 > flatfile format.
 > >
 > > I have Bio::SeqIO sequence format conversion wrapped in a command-line
 > script.  The script exposes to the command line the parameters to ->new for
 > both input and output objects through judicious use of GetOptions.  I have used
 > this script in many conversion tasks between many different formats.
 > >
 > > ... except now ...
 > >
 > > I am having trouble with reading the flatfile format.
 > >
 > > Happily, at first, I see that -display_id and -accession_number are both
 > parameters to Bio::SeqIO::table->new.  So they are naturally exposed to the
 > command line as `in format=table header=1 display_id=1 seq=3"
 > >
 > > Alas however -description is not a parameter to ->new.
 > >
 > > The only way I can see to configure table.pm to take the sequence
 > description (aka desc) from the 2nd column of my .tab file is as follows:
 > >
 > > $in->attribute_map({-description => 2});
 > >
 > > ... however my trace shows me that even though this does work to set the
 > desc attribute of the wrapped Bio::Primary_seq to the value from column 2,
 > unfortunately using the attribute_map also removes the individual values
 > passed in for -display_id and -accession_number
 > >
 > > Ideally (I think) Bio::SeqIO::table->new  would take a -description=2 instead
 > of having to call attribute_map.
 > >
 > > Or, Bio::SeqIO::table->new  would take  -attribute_map and even accept it as
 > a string which gets evaluated to a hash reference, just as I see -colnames can
 > be passed as a string evaling to an array (which I see in the unit test:
 > http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.924/t/SeqIO/table.t).  This
 > would allow the hash to be supplied at the command line.
 > >
 > > Or, am I missing something?
 > >
 > > FWIW: I am trying to help a lab convert a few years of plasmids from
 > DNAPlasmid to Genbank (for load into Vector NTI) and I am passing through
 > Bio::SeqiO::table in-so-diong.....
 > >
 > > Cheers, and Thanks for help and suggestions....
 > >
 > > Malcolm Cook
 > > Stowers Institute for Medical Research
 > > 1000 E 50th Street
 > > Kanas City, MO 64110
 > > (816) 926-4449
 > > [hidden email]
 > >
 > >
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > [hidden email]
 > > http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

test.tab.gb (510 bytes) Download Attachment
test.tab (42 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Brian Osborne-2
Working on this ….

On Sep 14, 2015, at 10:44 AM, Cook, Malcolm <[hidden email]> wrote:

'$out = Bio::SeqIO->new(-format => qq{table});


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Brian Osborne-2
Malcolm,

Done, in master. That “can not write to a generic …” error was due to the fact that write_seq() is not implemented for SeqIO::table, but someone forgot to put an “empty” write_seq() method into the module to catch any attempts. Fixed.

Brian O.

On Sep 21, 2015, at 9:18 AM, Brian Osborne <[hidden email]> wrote:

Working on this ….

On Sep 14, 2015, at 10:44 AM, Cook, Malcolm <[hidden email]> wrote:

'$out = Bio::SeqIO->new(-format => qq{table});

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Fields, Christopher J
Awesome, thanks Brian!

chris

On Sep 21, 2015, at 9:32 AM, Brian Osborne <[hidden email]> wrote:

Malcolm,

Done, in master. That “can not write to a generic …” error was due to the fact that write_seq() is not implemented for SeqIO::table, but someone forgot to put an “empty” write_seq() method into the module to catch any attempts. Fixed.

Brian O.

On Sep 21, 2015, at 9:18 AM, Brian Osborne <[hidden email]> wrote:

Working on this ….

On Sep 14, 2015, at 10:44 AM, Cook, Malcolm <[hidden email]> wrote:

'$out = Bio::SeqIO->new(-format => qq{table});

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l



_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

Mark A. Jensen

+1!

On Mon, Sep 21, 2015 at 11:04 AM, Fields, Christopher J <[hidden email]> wrote:

Awesome, thanks Brian!

chris

On Sep 21, 2015, at 9:32 AM, Brian Osborne <[hidden email]> wrote:

Malcolm,

Done, in master. That “can not write to a generic …” error was due to the fact that write_seq() is not implemented for SeqIO::table, but someone forgot to put an “empty” write_seq() method into the module to catch any attempts. Fixed.

Brian O.

On Sep 21, 2015, at 9:18 AM, Brian Osborne <[hidden email]> wrote:

Working on this ….

On Sep 14, 2015, at 10:44 AM, Cook, Malcolm <[hidden email]> wrote:

'$out = Bio::SeqIO->new(-format => qq{table});

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l



_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Loading...