Downloading genbank (full) format

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Downloading genbank (full) format

Cacau Centurion
Hi,

I would like to download sequences in genbank (full) format in batch. What should the format be (see the codes)?

I tried 'gb' but got sequences in genbank format. Sometimes the full genome sequences might not be downloaded.


#code
$seqin = Bio::SeqIO->new(-file   => $out,
                                -format => $format,
                                );  


Yours,
Cacau

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Downloading genbank (full) format

Peter Cock
Hi Cacau,

It seems "gb" is just an alias for "genbank", but as you've
noticed some records are more like manifests for how to
build up the full record from its parts using a CONTIG
entry.

There is also "gbwithparts" which in the NCBI web interface
is "Genbank (full)", but there can be subtle errors with the
processing done at the NCBI end.


Another possible but not yet confirmed issue:

Peter

On Wed, Sep 23, 2015 at 5:57 AM, Cacau Centurion <[hidden email]> wrote:
Hi,

I would like to download sequences in genbank (full) format in batch. What should the format be (see the codes)?

I tried 'gb' but got sequences in genbank format. Sometimes the full genome sequences might not be downloaded.


#code
$seqin = Bio::SeqIO->new(-file   => $out,
                                -format => $format,
                                );  


Yours,
Cacau

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Downloading genbank (full) format

Andreas Leimbach-2
Hi Cacau, Peter,

your code doesn't specify how you're downloading your sequences from
NCBI. Thus, this issue might be related if you're using Eutils.

Cheers,
Andreas

--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 33843
E-Mail: [hidden email]

On 23.09.2015 09:36, Peter Cock wrote:

> Hi Cacau,
>
> It seems "gb" is just an alias for "genbank", but as you've
> noticed some records are more like manifests for how to
> build up the full record from its parts using a CONTIG
> entry.
>
> There is also "gbwithparts" which in the NCBI web interface
> is "Genbank (full)", but there can be subtle errors with the
> processing done at the NCBI end.
>
> http://blastedbio.blogspot.co.uk/2012/03/missing-external-exons-in-genbank-with.html
> http://blastedbio.blogspot.co.uk/2012/04/missing-feature-locations-in-genbank.html
>
> Another possible but not yet confirmed issue:
> http://lists.open-bio.org/pipermail/biopython/2015-September/015746.html
>
> Peter
>
> On Wed, Sep 23, 2015 at 5:57 AM, Cacau Centurion <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I would like to download sequences in genbank (full) format in batch. What
>> should the format be (see the codes)?
>>
>> I tried 'gb' but got sequences in genbank format. Sometimes the full
>> genome sequences might not be downloaded.
>>
>>
>> #code
>> $seqin = Bio::SeqIO->new(-file   => $out,
>>                                 -format => $format,
>>                                 );
>>
>>
>> Yours,
>> Cacau
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Downloading genbank (full) format

Andreas Leimbach-2
sorry, forgot to include the link:
https://github.com/bioperl/bioperl-live/issues/89

Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 33843
E-Mail: [hidden email]

On 23.09.2015 10:08, Andreas Leimbach wrote:

> Hi Cacau, Peter,
>
> your code doesn't specify how you're downloading your sequences from
> NCBI. Thus, this issue might be related if you're using Eutils.
>
> Cheers,
> Andreas
>
> --
> Andreas Leimbach
> Universität Münster
> Institut für Hygiene
> Mendelstr. 7
> D-48149 Münster
> Germany
>
> Tel.: +49 (0)551 39 33843
> E-Mail: [hidden email]
>
> On 23.09.2015 09:36, Peter Cock wrote:
>> Hi Cacau,
>>
>> It seems "gb" is just an alias for "genbank", but as you've
>> noticed some records are more like manifests for how to
>> build up the full record from its parts using a CONTIG
>> entry.
>>
>> There is also "gbwithparts" which in the NCBI web interface
>> is "Genbank (full)", but there can be subtle errors with the
>> processing done at the NCBI end.
>>
>> http://blastedbio.blogspot.co.uk/2012/03/missing-external-exons-in-genbank-with.html
>> http://blastedbio.blogspot.co.uk/2012/04/missing-feature-locations-in-genbank.html
>>
>> Another possible but not yet confirmed issue:
>> http://lists.open-bio.org/pipermail/biopython/2015-September/015746.html
>>
>> Peter
>>
>> On Wed, Sep 23, 2015 at 5:57 AM, Cacau Centurion <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to download sequences in genbank (full) format in batch. What
>>> should the format be (see the codes)?
>>>
>>> I tried 'gb' but got sequences in genbank format. Sometimes the full
>>> genome sequences might not be downloaded.
>>>
>>>
>>> #code
>>> $seqin = Bio::SeqIO->new(-file   => $out,
>>>                                 -format => $format,
>>>                                 );
>>>
>>>
>>> Yours,
>>> Cacau
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Downloading genbank (full) format

Carnë Draug-2
In reply to this post by Cacau Centurion
On 22 September 2015 at 21:57, Cacau Centurion
<[hidden email]> wrote:

> Hi,
>
> I would like to download sequences in genbank (full) format in batch. What
> should the format be (see the codes)?
>
> I tried 'gb' but got sequences in genbank format. Sometimes the full genome
> sequences might not be downloaded.
>
>
> #code
> $seqin = Bio::SeqIO->new(-file   => $out,
>                                 -format => $format,
>                                 );
>

Take a look at the bp_genbank_ref_extractor for how to do it [1].
Alternatively, you can just use that program.  It batch downloads
all the results from a genbank search (see the examples from its
documentation [2]).


[1] https://github.com/bioperl/Bio-EUtilities/blob/master/bin/bp_genbank_ref_extractor
[2] https://github.com/bioperl/Bio-EUtilities/blob/master/bin/bp_genbank_ref_extractor#L249
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l