Limitations for Bio::DB::Fasta

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Limitations for Bio::DB::Fasta

Ki Baik
I'm trying to index a large fasta file that I downloaded from NCBI's ftp site. The size of the fasta file is 700GB. I'm trying to use Bio::DB::Fasta to index this file. When the index file hits around 10GB, it seems to hang. I'm wondering if there is a limit on the fasta file size it can index.

Also, how does Bio::DB::Fasta compare to Bio::Index::Fasta? Is one better for large fasta files? Are there any other indexing schemes I can use instead of these modules? Any information would be appreciated.

Thanks,

KB

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Limitations for Bio::DB::Fasta

Fields, Christopher J
Which file?  It’s something we could probably check. My feeling is it is one or more of:

1) Your version of perl doesn’t support large files efficiently (unlikely unless you are using old versions of perl).  But this should fail I think

2) DB_File itself isn’t very efficient if you have tons of sequences (millions).  Is that the case?

3) IO is ‘inefficient’, in other words you are running this on a non-optimal system where disk is a limiting factor.

Hard to say w/o testing it directly.  

There are alternatives just to note (samtools faidx comes to mind).

chris

On Jul 7, 2014, at 1:12 PM, Ki Baik <[hidden email]> wrote:

> I'm trying to index a large fasta file that I downloaded from NCBI's ftp site. The size of the fasta file is 700GB. I'm trying to use Bio::DB::Fasta to index this file. When the index file hits around 10GB, it seems to hang. I'm wondering if there is a limit on the fasta file size it can index.
>
> Also, how does Bio::DB::Fasta compare to Bio::Index::Fasta? Is one better for large fasta files? Are there any other indexing schemes I can use instead of these modules? Any information would be appreciated.
>
> Thanks,
>
> KB
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l