clustalw.pm: could not open sequence file error

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

clustalw.pm: could not open sequence file error

Olena Morozova
Hi all,

I am trying to use this script

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = 'C:/perl/clustalw1.8/';
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM',

'outfile'=> 'al_mouse.txt');
 my $factory =

Bio::Tools::Run::Alignment::Clustalw->new(@params);
 $inputfilename = 'c:/perl/mouse_unique.txt';
 my $aln = $factory->align($inputfilename);

to do a MSA, and it works for a test file with 2 or 3 sequences.
However, when I try to run it on my actual file (has 97 sequences)
which is in exactly the same format as the test file (fasta), I get a
"could not open the sequence file" error.
Is this because the file is too big and is there a way to fix this?
Thanks a lot for your help!

Olena

On 11/29/05, Jason Stajich <[hidden email]> wrote:

>
>
> Begin forwarded message:
>
> > From: neeti somaiya <[hidden email]>
> > Date: November 29, 2005 1:27:27 AM EST
> > To: Jason Stajich <[hidden email]>
> > Subject: Re: [Bioperl-l] need BLAT parse code
> >
> > I use the following code :
> >
> > open(FH,"output.psl");
> > while(<FH>)
> > {
> >     if( /^psLayout/ )
> >     {
> >           for( 1..4 ) { <> }
> >       }
> >      my @line = split;
> >      my ( $matches,$mismatches,$rep_matches,$n_count,
> >             $q_num_insert,$q_base_insert,
> >             $t_num_insert, $t_base_insert,
> >             $strand, $q_name, $q_length, $q_start,
> >             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
> >             $block_sizes,  $q_starts,      $t_starts
> >             ) = split;
> >
> >
> >       print $t_start;
> >       print "\n";
> >       print $t_end;
> >
> > }
> >
> > for output.psl file :
> >
> > match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
> > strand  Q               Q       Q       Q       T
> > T       T       T       block   blockSizes      qStarts  tStarts
> >         match   match           count   bases   count
> > bases           name            size    start   end
> > name            size    start   end     count
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------
> > 27025   0       0       0       0       0       0       0
> > +       query_sequence3 27025   0       27025
> > database_sequence3      57701691        132995  160020  1
> > 27025,  0,      132995,
> > ~
> >
> >
> > It gave me output :
> >
> > Q
> > Q
> >
> > 132995
> > 160020
> >
> > What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
> >
> > Please let me know.
> > Thanks.
> >
> > On 11/28/05, Jason Stajich <[hidden email]> wrote:
> > Bio::SearchIO::psl can parse psl output.
> >
> > or more simply:
> >
> > while(<>) {
> >    if( /^psLayout/ ) { # if there is a header
> >    for( 1..4 ) { <> }  # take next 4 lines to skip the header
> >    }
> >   my @line = split;
> >   my ( $matches,$mismatches,$rep_matches,$n_count,
> >              $q_num_insert,$q_base_insert,
> >              $t_num_insert, $t_base_insert,
> >              $strand, $q_name, $q_length, $q_start,
> >              $q_end, $t_name, $t_length,$t_start, $t_end,
> > $block_count,
> >              $block_sizes,  $q_starts,      $t_starts
> >              ) = split;
> >
> >   #  query aln vals are  $q_start, and $q_end values
> >   # hit aln vals are $t_start, $t_end
> > }
> >
> > On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > I am using BLAT in a project.I am having simple .psl output files
> > > after
> > > running BLAT of a gene sequences against full chromosomal
> > > sequences.Doesanyone have a simple BLAT parse code. I am only
> > > interested in obtaining the
> > > alignment start and end positions on the target.
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > [hidden email]
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> >
> >
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: clustalw.pm: could not open sequence file error

smarkel
Olena,

Are you getting a BioPerl error or a ClustalW error?

What happens if you invoke ClustalW directly on your input
file, i.e., without using BioPerl?

Scott

Olena Morozova wrote:

> Hi all,
>
> I am trying to use this script
>
> use Bio::Tools::Run::Alignment::Clustalw;
>
> $ENV{CLUSTALDIR} = 'C:/perl/clustalw1.8/';
>  my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM',
>
> 'outfile'=> 'al_mouse.txt');
>  my $factory =
>
> Bio::Tools::Run::Alignment::Clustalw->new(@params);
>  $inputfilename = 'c:/perl/mouse_unique.txt';
>  my $aln = $factory->align($inputfilename);
>
> to do a MSA, and it works for a test file with 2 or 3 sequences.
> However, when I try to run it on my actual file (has 97 sequences)
> which is in exactly the same format as the test file (fasta), I get a
> "could not open the sequence file" error.
> Is this because the file is too big and is there a way to fix this?
> Thanks a lot for your help!
>
> Olena

--
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  [hidden email]
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: clustalw.pm: could not open sequence file error

Barry Moore
In reply to this post by Olena Morozova
Olena,

Does the filename for the file in question have any spaces anywhere in
the path?  I know clustalx won't open files with a space in the path
even though Windows allows that.  Don't know for sure on clustalw, but
seems like it might behave the same way.

Barry
-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Olena
Morozova
Sent: Tuesday, November 29, 2005 3:34 PM
To: bioperl-ml List
Subject: [Bioperl-l] clustalw.pm: could not open sequence file error

Hi all,

I am trying to use this script

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = 'C:/perl/clustalw1.8/';
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM',

'outfile'=> 'al_mouse.txt');
 my $factory =

Bio::Tools::Run::Alignment::Clustalw->new(@params);
 $inputfilename = 'c:/perl/mouse_unique.txt';
 my $aln = $factory->align($inputfilename);

to do a MSA, and it works for a test file with 2 or 3 sequences.
However, when I try to run it on my actual file (has 97 sequences)
which is in exactly the same format as the test file (fasta), I get a
"could not open the sequence file" error.
Is this because the file is too big and is there a way to fix this?
Thanks a lot for your help!

Olena

On 11/29/05, Jason Stajich <[hidden email]> wrote:

>
>
> Begin forwarded message:
>
> > From: neeti somaiya <[hidden email]>
> > Date: November 29, 2005 1:27:27 AM EST
> > To: Jason Stajich <[hidden email]>
> > Subject: Re: [Bioperl-l] need BLAT parse code
> >
> > I use the following code :
> >
> > open(FH,"output.psl");
> > while(<FH>)
> > {
> >     if( /^psLayout/ )
> >     {
> >           for( 1..4 ) { <> }
> >       }
> >      my @line = split;
> >      my ( $matches,$mismatches,$rep_matches,$n_count,
> >             $q_num_insert,$q_base_insert,
> >             $t_num_insert, $t_base_insert,
> >             $strand, $q_name, $q_length, $q_start,
> >             $q_end, $t_name, $t_length,$t_start, $t_end,
$block_count,

> >             $block_sizes,  $q_starts,      $t_starts
> >             ) = split;
> >
> >
> >       print $t_start;
> >       print "\n";
> >       print $t_end;
> >
> > }
> >
> > for output.psl file :
> >
> > match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
> > strand  Q               Q       Q       Q       T
> > T       T       T       block   blockSizes      qStarts  tStarts
> >         match   match           count   bases   count
> > bases           name            size    start   end
> > name            size    start   end     count
> >
----------------------------------------------------------------------
> >
----------------------------------------------------------------------

> > -------------------
> > 27025   0       0       0       0       0       0       0
> > +       query_sequence3 27025   0       27025
> > database_sequence3      57701691        132995  160020  1
> > 27025,  0,      132995,
> > ~
> >
> >
> > It gave me output :
> >
> > Q
> > Q
> >
> > 132995
> > 160020
> >
> > What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
> >
> > Please let me know.
> > Thanks.
> >
> > On 11/28/05, Jason Stajich <[hidden email]> wrote:
> > Bio::SearchIO::psl can parse psl output.
> >
> > or more simply:
> >
> > while(<>) {
> >    if( /^psLayout/ ) { # if there is a header
> >    for( 1..4 ) { <> }  # take next 4 lines to skip the header
> >    }
> >   my @line = split;
> >   my ( $matches,$mismatches,$rep_matches,$n_count,
> >              $q_num_insert,$q_base_insert,
> >              $t_num_insert, $t_base_insert,
> >              $strand, $q_name, $q_length, $q_start,
> >              $q_end, $t_name, $t_length,$t_start, $t_end,
> > $block_count,
> >              $block_sizes,  $q_starts,      $t_starts
> >              ) = split;
> >
> >   #  query aln vals are  $q_start, and $q_end values
> >   # hit aln vals are $t_start, $t_end
> > }
> >
> > On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > I am using BLAT in a project.I am having simple .psl output files
> > > after
> > > running BLAT of a gene sequences against full chromosomal
> > > sequences.Doesanyone have a simple BLAT parse code. I am only
> > > interested in obtaining the
> > > alignment start and end positions on the target.
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > [hidden email]
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> >
> >
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Loading...