Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

Warren Gallin
With the advent of the WP_   accession series in RefSeq there is no longer a direct link between a single protein sequence and its encoding nucleotide sequence.

It is possible to find the multiple individual nucleotide records encoding the identical protein sequences on the Web interface through the “Identical Proteins” link, which generates a list of all of the coding sequences for the identical protein sequence.

Is there any way to work through these linkages using Bio::DB::Utilities?

Thanks,

Warren Gallin
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

Peter Cock
Good point. Ivan Erill asked about this on the Biopython list late
last year - presumably the same solution would apply there too?:

http://lists.open-bio.org/pipermail/biopython/2014-October/015438.html

See also:
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf

Peter

On Mon, Apr 27, 2015 at 6:35 PM, Warren Gallin <[hidden email]> wrote:

> With the advent of the WP_   accession series in RefSeq there is no longer a direct link between a single protein sequence and its encoding nucleotide sequence.
>
> It is possible to find the multiple individual nucleotide records encoding the identical protein sequences on the Web interface through the “Identical Proteins” link, which generates a list of all of the coding sequences for the identical protein sequence.
>
> Is there any way to work through these linkages using Bio::DB::Utilities?
>
> Thanks,
>
> Warren Gallin
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

Fields, Christopher J
According to that post, Ivan wasn’t able to access the data via elink (see question at bottom); any idea whether he received an answer?  I’ll have to look at whether this is possible via Bio::DB::EUtilities.

chris

> On Apr 27, 2015, at 3:52 PM, Peter Cock <[hidden email]> wrote:
>
> Good point. Ivan Erill asked about this on the Biopython list late
> last year - presumably the same solution would apply there too?:
>
> http://lists.open-bio.org/pipermail/biopython/2014-October/015438.html
>
> See also:
> ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf
>
> Peter
>
> On Mon, Apr 27, 2015 at 6:35 PM, Warren Gallin <[hidden email]> wrote:
>> With the advent of the WP_   accession series in RefSeq there is no longer a direct link between a single protein sequence and its encoding nucleotide sequence.
>>
>> It is possible to find the multiple individual nucleotide records encoding the identical protein sequences on the Web interface through the “Identical Proteins” link, which generates a list of all of the coding sequences for the identical protein sequence.
>>
>> Is there any way to work through these linkages using Bio::DB::Utilities?
>>
>> Thanks,
>>
>> Warren Gallin
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

Fields, Christopher J
Warren, Peter,

The below works for me; you should be able to grab the IDs from the link sets returned if you iterate through them (3 different link sets in this example, you may want to see if there is a specific subset you need).

I’m guessing the Biopython version just lacked the ‘db’ setting?  Or does it default to ‘nuccore’?

-----------------------------------
use Bio::DB::EUtilities;

my $eutil = Bio::DB::EUtilities->new(-eutil     => 'elink',
                                     -dbfrom    => 'protein',
                                     -db        => 'nuccore',
                                     -id        => '446211235’, # WP_000289090.1
                                     -email     => '[hidden email]');

$eutil->print_all;
-----------------------------------

chris

> On Apr 30, 2015, at 7:51 PM, Fields, Christopher J <[hidden email]> wrote:
>
> According to that post, Ivan wasn’t able to access the data via elink (see question at bottom); any idea whether he received an answer?  I’ll have to look at whether this is possible via Bio::DB::EUtilities.
>
> chris
>
>> On Apr 27, 2015, at 3:52 PM, Peter Cock <[hidden email]> wrote:
>>
>> Good point. Ivan Erill asked about this on the Biopython list late
>> last year - presumably the same solution would apply there too?:
>>
>> http://lists.open-bio.org/pipermail/biopython/2014-October/015438.html
>>
>> See also:
>> ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf
>>
>> Peter
>>
>> On Mon, Apr 27, 2015 at 6:35 PM, Warren Gallin <[hidden email]> wrote:
>>> With the advent of the WP_   accession series in RefSeq there is no longer a direct link between a single protein sequence and its encoding nucleotide sequence.
>>>
>>> It is possible to find the multiple individual nucleotide records encoding the identical protein sequences on the Web interface through the “Identical Proteins” link, which generates a list of all of the coding sequences for the identical protein sequence.
>>>
>>> Is there any way to work through these linkages using Bio::DB::Utilities?
>>>
>>> Thanks,
>>>
>>> Warren Gallin
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l