use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
Hill, dear perlers,

I‘m trying to use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV,
like "while (<>) {....}". After several trier and error,  I'm failed and
need to ask for herp from you. Could you please help me to check or try
following code?

Thanks in advance.

---------------------------------------
use Bio::SeqIO ;
use Statistics::Descriptive ;

my %opt = () ;
my $sta = Statistics::Descriptive::Full->new();

##### here is the key, I think.
my $in = Bio::SeqIO->new(-format=>"Fasta");
while(my $s = $in->next_seq()){
    $sta->add_data($s->length()) ;
}
print $sta->sum() if $opt{sum} ;

__DATA__
>ct1
AGAGAGAGA
>ctg2
ATATATAT
-----------------------------------------------

Regards

Haiyan






_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Paul Cantalupo
Hi Haiyan,

You need to use the '-fh' option in Bio::SeqIO new and have it use Perl's DATA filehandle like so:
my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');

This was taken from http://perldoc.perl.org/perldata.html#Special-Literals:
"Text after __DATA__ may be read via the filehandle PACKNAME::DATA , where PACKNAME is the package that was current when the __DATA__ token was encountered."


Good luck,

Paul



Paul Cantalupo
University of Pittsburgh


On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin <[hidden email]> wrote:
Hill, dear perlers,

I‘m trying to use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV,
like "while (<>) {....}". After several trier and error,  I'm failed and
need to ask for herp from you. Could you please help me to check or try
following code?

Thanks in advance.

---------------------------------------
use Bio::SeqIO ;
use Statistics::Descriptive ;

my %opt = () ;
my $sta = Statistics::Descriptive::Full->new();

##### here is the key, I think.
my $in = Bio::SeqIO->new(-format=>"Fasta");
while(my $s = $in->next_seq()){
    $sta->add_data($s->length()) ;
}
print $sta->sum() if $opt{sum} ;

__DATA__
>ct1
AGAGAGAGA
>ctg2
ATATATAT
-----------------------------------------------

Regards

Haiyan






_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
Hi, Paul,

Thanks for your quick replay and advice. I have tried your method. But
it complains that

------------------------------------
[linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
Name "main::DATA" used only once: possible typo at ./fastaLen.pl line
42.
Use of uninitialized value in print at ./fastaLen.pl line 49.
------------------------------------


I had also tried the "-fh=>\*STDIN", and "-fh=><>",according to the
documentation returned by "perldoc Bio::SeqIO" at termminal, but I still
haven't got I want.Thanks again.

... ...

   Bio::SeqIO->new()
          $seqIO = Bio::SeqIO->new(-file => 'filename',
-format=>$format);
          $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
-format=>$format);
          $seqIO = Bio::SeqIO->new(-format => $format);

       ... ...

       -fh  You may provide new() with a previously-opened filehandle.
For example, to read from STDIN:

               $seqIO = Bio::SeqIO->new(-fh => \*STDIN);

            Note that you must pass filehandles as references to globs.

            If neither a filehandle nor a filename is specified, then
the module will read from the @ARGV array or STDIN, using the
            familiar <> semantics.

-------------------------------



Regards

Haiyan




On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:

> Hi Haiyan,
>
>
> You need to use the '-fh' option in Bio::SeqIO new and have it use
> Perl's DATA filehandle like so:
> my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
>
> This was taken from
> http://perldoc.perl.org/perldata.html#Special-Literals:
> "Text after __DATA__ may be read via the filehandle PACKNAME::DATA ,
> where PACKNAME is the package that was current when the __DATA__ token
> was encountered."
>
>
>
> Good luck,
>
> Paul
>
>
>
>
> Paul Cantalupo
> University of Pittsburgh
>
>
>
> On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin <[hidden email]>
> wrote:
>         Hill, dear perlers,
>        
>         I‘m trying to use Bio::SeqIO to read Fasta sequence from pipe,
>         or @ARGV,
>         like "while (<>) {....}". After several trier and error,  I'm
>         failed and
>         need to ask for herp from you. Could you please help me to
>         check or try
>         following code?
>        
>         Thanks in advance.
>        
>         ---------------------------------------
>         use Bio::SeqIO ;
>         use Statistics::Descriptive ;
>        
>         my %opt = () ;
>         my $sta = Statistics::Descriptive::Full->new();
>        
>         ##### here is the key, I think.
>         my $in = Bio::SeqIO->new(-format=>"Fasta");
>         while(my $s = $in->next_seq()){
>             $sta->add_data($s->length()) ;
>         }
>         print $sta->sum() if $opt{sum} ;
>        
>         __DATA__
>         >ct1
>         AGAGAGAGA
>         >ctg2
>         ATATATAT
>         -----------------------------------------------
>        
>         Regards
>        
>         Haiyan
>        
>        
>        
>        
>        
>        
>         _______________________________________________
>         Bioperl-l mailing list
>         [hidden email]
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
In reply to this post by Paul Cantalupo

Hi, Paul,

Thanks for your quick replay and advice. Soryy for my late feedback, I can't send maill in my location yesterday. I have tried your method. But
it complains that

------------------------------------
[linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
Name "main::DATA" used only once: possible typo at ./fastaLen.pl line
42.
Use of uninitialized value in print at ./fastaLen.pl line 49.
------------------------------------


I had also tried the "-fh=>\*STDIN", and "-fh=><>",according to the
documentation returned by "perldoc Bio::SeqIO" at termminal, but I still
haven't got I want.Thanks again.

... ...

   Bio::SeqIO->new()
          $seqIO = Bio::SeqIO->new(-file => 'filename',
-format=>$format);
          $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
-format=>$format);
          $seqIO = Bio::SeqIO->new(-format => $format);

       ... ...

       -fh  You may provide new() with a previously-opened filehandle.
For example, to read from STDIN:

               $seqIO = Bio::SeqIO->new(-fh => \*STDIN);

            Note that you must pass filehandles as references to globs.

            If neither a filehandle nor a filename is specified, then
the module will read from the @ARGV array or STDIN, using the
            familiar <> semantics.

-------------------------------



Regards

Haiyan




On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:

> Hi Haiyan,
>
>
> You need to use the '-fh' option in Bio::SeqIO new and have it use
> Perl's DATA filehandle like so:
> my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
>
> This was taken from
> http://perldoc.perl.org/perldata.html#Special-Literals:
> "Text after __DATA__ may be read via the filehandle PACKNAME::DATA ,
> where PACKNAME is the package that was current when the __DATA__ token
> was encountered."
>
>
>
> Good luck,
>
> Paul
>
>
>
>
> Paul Cantalupo
> University of Pittsburgh
>
>
>
> On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin <[hidden email]>
> wrote:
>         Hill, dear perlers,
>        
>         I‘m trying to use Bio::SeqIO to read Fasta sequence from pipe,
>         or @ARGV,
>         like "while (<>) {....}". After several trier and error,  I'm
>         failed and
>         need to ask for herp from you. Could you please help me to
>         check or try
>         following code?
>        
>         Thanks in advance.
>        
>         ---------------------------------------
>         use Bio::SeqIO ;
>         use Statistics::Descriptive ;
>        
>         my %opt = () ;
>         my $sta = Statistics::Descriptive::Full->new();
>        
>         ##### here is the key, I think.
>         my $in = Bio::SeqIO->new(-format=>"Fasta");
>         while(my $s = $in->next_seq()){
>             $sta->add_data($s->length()) ;
>         }
>         print $sta->sum() if $opt{sum} ;
>        
>         __DATA__
>         >ct1
>         AGAGAGAGA
>         >ctg2
>         ATATATAT
>         -----------------------------------------------
>        
>         Regards
>        
>         Haiyan
>        
>        
>        
>        
>        
>        
>         _______________________________________________
>         Bioperl-l mailing list
>         [hidden email]
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
>



_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Torsten Seemann
In reply to this post by Haiyan Lin
Thanks for your quick replay and advice. I have tried your method. But it complains that

------------------------------------
[linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
Name "main::DATA" used only once: possible typo at ./fastaLen.pl line
42.
Use of uninitialized value in print at ./fastaLen.pl line 49.
------------------------------------

I don't think you mean to use the "less" command. Try one of these:

% cat try.fa | perl ./fastaLen.pl

or

% perl ./fastaLen.pl < try.fa


--
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Mark A. Jensen
In reply to this post by Haiyan Lin

You should do
cat try.fa | perl ..
rather than less,IMO. less formats things.

You can add
  no warnings qw/once/;
to get rid of that warning, it shouldn't affect
the program.

MAJ


On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <[hidden email]> wrote:


Hi, Paul,

Thanks for your quick replay and advice. Soryy for my late feedback, I can't send maill in my location yesterday. I have tried your method. But
it complains that

------------------------------------
[linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
Name "main::DATA" used only once: possible typo at ./fastaLen.pl line
42.
Use of uninitialized value in print at ./fastaLen.pl line 49.
------------------------------------

I had also tried the "-fh=>\*STDIN", and "-fh=><>",according to the
documentation returned by "perldoc Bio::SeqIO" at termminal, but I still
haven't got I want.Thanks again.

... ...

   Bio::SeqIO->new()
          $seqIO = Bio::SeqIO->new(-file => 'filename',
-format=>$format);
          $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
-format=>$format);
          $seqIO = Bio::SeqIO->new(-format => $format);

       ... ...

       -fh  You may provide new() with a previously-opened filehandle.
For example, to read from STDIN:

               $seqIO = Bio::SeqIO->new(-fh => \*STDIN);

            Note that you must pass filehandles as references to globs.

            If neither a filehandle nor a filename is specified, then
the module will read from the @ARGV array or STDIN, using the
            familiar <> semantics.

-------------------------------

Regards

Haiyan


On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
> Hi Haiyan,
>
>
> You need to use the '-fh' option in Bio::SeqIO new and have it use
> Perl's DATA filehandle like so:
> my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
>
> This was taken from
> http://perldoc.perl.org/perldata.html#Special-Literals:
> "Text after __DATA__ may be read via the filehandle PACKNAME::DATA ,
> where PACKNAME is the package that was current when the __DATA__ token
> was encountered."
>
>
>
> Good luck,
>
> Paul
>
>
>
>
> Paul Cantalupo
> University of Pittsburgh
>
>
>
> On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin <[hidden email]>
> wrote:
>         Hill, dear perlers,
>        
>         I‘m trying to use Bio::SeqIO to read Fasta sequence from pipe,
>         or @ARGV,
>         like "while (<>) {....}". After several trier and error,  I'm
>         failed and
>         need to ask for herp from you. Could you please help me to
>         check or try
>         following code?
>        
>         Thanks in advance.
>        
>         ---------------------------------------
>         use Bio::SeqIO ;
>         use Statistics::Descriptive ;
>        
>         my %opt = () ;
>         my $sta = Statistics::Descriptive::Full->new();
>        
>         ##### here is the key, I think.
>         my $in = Bio::SeqIO->new(-format=>"Fasta");
>         while(my $s = $in->next_seq()){
>             $sta->add_data($s->length()) ;
>         }
>         print $sta->sum() if $opt{sum} ;
>        
>         __DATA__
>         >ct1
>         AGAGAGAGA
>         >ctg2
>         ATATATAT
>         -----------------------------------------------
>        
>         Regards
>        
>         Haiyan
>        
>        
>        
>        
>        
>        
>         _______________________________________________
>         Bioperl-l mailing list
>         [hidden email]
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
Thanks to Mark and Torsten for their advice.

I can got what I want using both "less" and "cat" by writing data form
upstream pipeline into a temporary file, followed by  processing and
removing. But, it is not efficient. I'm looking for a better method.
Whether can I avoiding writting and removinf the file? Thanks for any
advic

##### Following is MyData
[linhy@bioinfo1 Script]$ more try.fa
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT

##### Following is my code
[linhy@bioinfo1 Script]$ more fastaLen.pl
#!/usr/bin/env perl

use strict;
use warnings;
use utf8;
use Bio::SeqIO ;
use Getopt::Long;
use Statistics::Descriptive ;

my %opt = () ;
GetOptions(
    "sum"=>\$opt{sum},
);

my $tf = "/tmp/subFasta.len.PID$$.fa";
open WTF,">$tf" or die $!;
while(<>){
    print WTF  ;
}
close WTF;

my $sta = Statistics::Descriptive::Full->new();

my $in = Bio::SeqIO->new(-format=>"Fasta",-file=>$tf);

#### I'm failed with the following commented line
#my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);

while(my $s = $in->next_seq()){
    $sta->add_data($s->length()) ;
}
`rm $tf` ;

if ( $opt{sum} ) {
    print $sta->sum(), "\n"  ;
}

exit;


##### The output by above code and data using by both "cat" and "less"
[linhy@bioinfo1 Script]$ cat try.fa | ./fastaLen.pl -s
70
[linhy@bioinfo1 Script]$ less try.fa | ./fastaLen.pl -s
70















On Sat, 2014-07-12 at 20:57 -0400, Mark A Jensen wrote:

> You should do
> cat try.fa | perl ..
> rather than less,IMO. less formats things.
>
> You can add
>   no warnings qw/once/;
> to get rid of that warning, it shouldn't affect
> the program.
>
> MAJ
>
>
> On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <[hidden email]>
> wrote:
>
>        
>         Hi, Paul,
>        
>         Thanks for your quick replay and advice. Soryy for my late
>         feedback, I can't send maill in my location yesterday. I have
>         tried your method. But
>         it complains that
>        
>         ------------------------------------
>         [linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
>         Name "main::DATA" used only once: possible typo
>         at ./fastaLen.pl line
>         42.
>         Use of uninitialized value in print at ./fastaLen.pl line 49.
>         ------------------------------------
>        
>        
>         I had also tried the "-fh=>\*STDIN", and "-fh=><>",according
>         to the
>         documentation returned by "perldoc Bio::SeqIO" at termminal,
>         but I still
>         haven't got I want.Thanks again.
>        
>         ... ...
>        
>            Bio::SeqIO->new()
>                   $seqIO = Bio::SeqIO->new(-file => 'filename',
>         -format=>$format);
>                   $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
>         -format=>$format);
>                   $seqIO = Bio::SeqIO->new(-format => $format);
>        
>                ... ...
>        
>                -fh  You may provide new() with a previously-opened
>         filehandle.
>         For example, to read from STDIN:
>        
>                        $seqIO = Bio::SeqIO->new(-fh => \*STDIN);
>        
>                     Note that you must pass filehandles as references
>         to globs.
>        
>                     If neither a filehandle nor a filename is
>         specified, then
>         the module will read from the @ARGV array or STDIN, using the
>                     familiar <> semantics.
>        
>         -------------------------------
>        
>        
>        
>         Regards
>        
>         Haiyan
>        
>        
>        
>        
>         On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
>         > Hi Haiyan,
>         >
>         >
>         > You need to use the '-fh' option in Bio::SeqIO new and have
>         it use
>         > Perl's DATA filehandle like so:
>         > my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
>         >
>         > This was taken from
>         > http://perldoc.perl.org/perldata.html#Special-Literals:
>         > "Text after __DATA__ may be read via the filehandle
>         PACKNAME::DATA ,
>         > where PACKNAME is the package that was current when the
>         __DATA__ token
>         > was encountered."
>         >
>         >
>         >
>         > Good luck,
>         >
>         > Paul
>         >
>         >
>         >
>         >
>         > Paul Cantalupo
>         > University of Pittsburgh
>         >
>         >
>         >
>         > On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin
>         <[hidden email]>
>         > wrote:
>         >         Hill, dear perlers,
>         >        
>         >         I‘m trying to use Bio::SeqIO to read Fasta sequence
>         from pipe,
>         >         or @ARGV,
>         >         like "while (<>) {....}". After several trier and
>         error,  I'm
>         >         failed and
>         >         need to ask for herp from you. Could you please help
>         me to
>         >         check or try
>         >         following code?
>         >        
>         >         Thanks in advance.
>         >        
>         >         ---------------------------------------
>         >         use Bio::SeqIO ;
>         >         use Statistics::Descriptive ;
>         >        
>         >         my %opt = () ;
>         >         my $sta = Statistics::Descriptive::Full->new();
>         >        
>         >         ##### here is the key, I think.
>         >         my $in = Bio::SeqIO->new(-format=>"Fasta");
>         >         while(my $s = $in->next_seq()){
>         >             $sta->add_data($s->length()) ;
>         >         }
>         >         print $sta->sum() if $opt{sum} ;
>         >        
>         >         __DATA__
>         >         >ct1
>         >         AGAGAGAGA
>         >         >ctg2
>         >         ATATATAT
>         >         -----------------------------------------------
>         >        
>         >         Regards
>         >        
>         >         Haiyan
>         >        
>         >        
>         >        
>         >        
>         >        
>         >        
>         >         _______________________________________________
>         >         Bioperl-l mailing list
>         >         [hidden email]
>         >
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>         >
>         >
>        
>        
>        
>         _______________________________________________
>         Bioperl-l mailing list
>         [hidden email]
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>        


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Mark A. Jensen
Haiyan - I think you want the following--

##########

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Bio::SeqIO ;
use Getopt::Long;
use Statistics::Descriptive ;

no warnings qw/once/;

my %opt = () ;
GetOptions(
     "sum"=>\$opt{sum},
);
my $sta = Statistics::Descriptive::Full->new();

my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);

while(my $s = $in->next_seq()){
     $sta->add_data($s->length()) ;
}

#`rm $tf` ;
unlink $tf;

if ( $opt{sum} ) {
     print $sta->sum(), "\n"  ;
}

# the DATA goes here:
__END__
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT

############



On 2014-07-12 21:56, Haiyan Lin wrote:

> Thanks to Mark and Torsten for their advice.
>
> I can got what I want using both "less" and "cat" by writing data
> form
> upstream pipeline into a temporary file, followed by  processing and
> removing. But, it is not efficient. I'm looking for a better method.
> Whether can I avoiding writting and removinf the file? Thanks for any
> advic
>
> ##### Following is MyData
> [linhy@bioinfo1 Script]$ more try.fa
>>Contig000001
> CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>>hello
> ATATATTTTT
>
> ##### Following is my code
> [linhy@bioinfo1 Script]$ more fastaLen.pl
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
> use utf8;
> use Bio::SeqIO ;
> use Getopt::Long;
> use Statistics::Descriptive ;
>
> my %opt = () ;
> GetOptions(
>     "sum"=>\$opt{sum},
> );
>
> my $tf = "/tmp/subFasta.len.PID$$.fa";
> open WTF,">$tf" or die $!;
> while(<>){
>     print WTF  ;
> }
> close WTF;
>
> my $sta = Statistics::Descriptive::Full->new();
>
> my $in = Bio::SeqIO->new(-format=>"Fasta",-file=>$tf);
>
> #### I'm failed with the following commented line
> #my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);
>
> while(my $s = $in->next_seq()){
>     $sta->add_data($s->length()) ;
> }
> `rm $tf` ;
>
> if ( $opt{sum} ) {
>     print $sta->sum(), "\n"  ;
> }
>
> exit;
>
>
> ##### The output by above code and data using by both "cat" and
> "less"
> [linhy@bioinfo1 Script]$ cat try.fa | ./fastaLen.pl -s
> 70
> [linhy@bioinfo1 Script]$ less try.fa | ./fastaLen.pl -s
> 70
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 2014-07-12 at 20:57 -0400, Mark A Jensen wrote:
>> You should do
>> cat try.fa | perl ..
>> rather than less,IMO. less formats things.
>>
>> You can add
>>   no warnings qw/once/;
>> to get rid of that warning, it shouldn't affect
>> the program.
>>
>> MAJ
>>
>>
>> On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <[hidden email]>
>> wrote:
>>
>>
>>         Hi, Paul,
>>
>>         Thanks for your quick replay and advice. Soryy for my late
>>         feedback, I can't send maill in my location yesterday. I
>> have
>>         tried your method. But
>>         it complains that
>>
>>         ------------------------------------
>>         [linhy@bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
>>         Name "main::DATA" used only once: possible typo
>>         at ./fastaLen.pl line
>>         42.
>>         Use of uninitialized value in print at ./fastaLen.pl line
>> 49.
>>         ------------------------------------
>>
>>
>>         I had also tried the "-fh=>\*STDIN", and "-fh=><>",according
>>         to the
>>         documentation returned by "perldoc Bio::SeqIO" at termminal,
>>         but I still
>>         haven't got I want.Thanks again.
>>
>>         ... ...
>>
>>            Bio::SeqIO->new()
>>                   $seqIO = Bio::SeqIO->new(-file => 'filename',
>>         -format=>$format);
>>                   $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
>>         -format=>$format);
>>                   $seqIO = Bio::SeqIO->new(-format => $format);
>>
>>                ... ...
>>
>>                -fh  You may provide new() with a previously-opened
>>         filehandle.
>>         For example, to read from STDIN:
>>
>>                        $seqIO = Bio::SeqIO->new(-fh => \*STDIN);
>>
>>                     Note that you must pass filehandles as
>> references
>>         to globs.
>>
>>                     If neither a filehandle nor a filename is
>>         specified, then
>>         the module will read from the @ARGV array or STDIN, using
>> the
>>                     familiar <> semantics.
>>
>>         -------------------------------
>>
>>
>>
>>         Regards
>>
>>         Haiyan
>>
>>
>>
>>
>>         On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
>>         > Hi Haiyan,
>>         >
>>         >
>>         > You need to use the '-fh' option in Bio::SeqIO new and
>> have
>>         it use
>>         > Perl's DATA filehandle like so:
>>         > my $in = Bio::SeqIO->new(-fh => \*DATA, -format =>
>> 'fasta');
>>         >
>>         > This was taken from
>>         > http://perldoc.perl.org/perldata.html#Special-Literals:
>>         > "Text after __DATA__ may be read via the filehandle
>>         PACKNAME::DATA ,
>>         > where PACKNAME is the package that was current when the
>>         __DATA__ token
>>         > was encountered."
>>         >
>>         >
>>         >
>>         > Good luck,
>>         >
>>         > Paul
>>         >
>>         >
>>         >
>>         >
>>         > Paul Cantalupo
>>         > University of Pittsburgh
>>         >
>>         >
>>         >
>>         > On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin
>>         <[hidden email]>
>>         > wrote:
>>         >         Hill, dear perlers,
>>         >
>>         >         I‘m trying to use Bio::SeqIO to read Fasta
>> sequence
>>         from pipe,
>>         >         or @ARGV,
>>         >         like "while (<>) {....}". After several trier and
>>         error,  I'm
>>         >         failed and
>>         >         need to ask for herp from you. Could you please
>> help
>>         me to
>>         >         check or try
>>         >         following code?
>>         >
>>         >         Thanks in advance.
>>         >
>>         >         ---------------------------------------
>>         >         use Bio::SeqIO ;
>>         >         use Statistics::Descriptive ;
>>         >
>>         >         my %opt = () ;
>>         >         my $sta = Statistics::Descriptive::Full->new();
>>         >
>>         >         ##### here is the key, I think.
>>         >         my $in = Bio::SeqIO->new(-format=>"Fasta");
>>         >         while(my $s = $in->next_seq()){
>>         >             $sta->add_data($s->length()) ;
>>         >         }
>>         >         print $sta->sum() if $opt{sum} ;
>>         >
>>         >         __DATA__
>>         >         >ct1
>>         >         AGAGAGAGA
>>         >         >ctg2
>>         >         ATATATAT
>>         >         -----------------------------------------------
>>         >
>>         >         Regards
>>         >
>>         >         Haiyan
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >         _______________________________________________
>>         >         Bioperl-l mailing list
>>         >         [hidden email]
>>         >
>>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>         >
>>         >
>>
>>
>>
>>         _______________________________________________
>>         Bioperl-l mailing list
>>         [hidden email]
>>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
In reply to this post by Haiyan Lin
Hi, geogre,

Thanks your code is working for data embedded in code with "__DATA__".
But, it failed to hand data from outside. Following is the data and
result.

[linhy@bioinfo1 Script]$ more try.fa
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT
[linhy@bioinfo1 Script]$ cat try.fa | perl try.pl
seq is AGAGAGAGA
seq is ATATATAT

Thanks

Haiyan



On Sat, 2014-07-12 at 18:36 -0700, george hartzell wrote:

> I just did this on my Mac OS X 10.9.4 system with perl 5.18.2:
>
> cd tmp
> mkdir haiyan
> cd haiyan
> cpanm -n -L local Bio::SeqIO
>
> perl -Ilocal/lib/perl5 foo.pl
>
> With the following little program in foo.pl:
>
> use Bio::SeqIO ;
>
> my $in = Bio::SeqIO->new(-format=>"Fasta", -fh => \*DATA);
> while(my $s = $in->next_seq()){
>     print "seq is " . $s->seq . "\n"
> }
>
> __DATA__
> >ct1
> AGAGAGAGA
> >ctg2
> ATATATAT
>
> and it does this when I run it:
>
> (alacrity)[18:22:58]haiyan>>perl -Ilocal/lib/perl5 foo.pl
> seq is AGAGAGAGA
> seq is ATATATAT
> (alacrity)[18:24:42]haiyan>>
>
> Can you get the same series of things to work?
>
> If you’re doing a bunch of this kind of stuff, you might want to look
> at Data::Section; rjbs discusses it
> here. It’s warmer and fuzzier than dealing with the DATA handle by
> yourself.
>
> g.
>
> ​


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Fields, Christopher J
Haiyan,

Do you want a test script that uses the DATA handle or a script that can pull in data from STDIN (e.g. from outside)?  They are not the same.  It’s possible to do both but I think you’re conflating purposes here; someone using this script might not expect it to behave both ways.

chris

On Jul 12, 2014, at 9:23 PM, Haiyan Lin <[hidden email]> wrote:

> Hi, geogre,
>
> Thanks your code is working for data embedded in code with "__DATA__".
> But, it failed to hand data from outside. Following is the data and
> result.
>
> [linhy@bioinfo1 Script]$ more try.fa
>> Contig000001
> CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>> hello
> ATATATTTTT
> [linhy@bioinfo1 Script]$ cat try.fa | perl try.pl
> seq is AGAGAGAGA
> seq is ATATATAT
>
> Thanks
>
> Haiyan
>
>
>
> On Sat, 2014-07-12 at 18:36 -0700, george hartzell wrote:
>> I just did this on my Mac OS X 10.9.4 system with perl 5.18.2:
>>
>> cd tmp
>> mkdir haiyan
>> cd haiyan
>> cpanm -n -L local Bio::SeqIO
>>
>> perl -Ilocal/lib/perl5 foo.pl
>>
>> With the following little program in foo.pl:
>>
>> use Bio::SeqIO ;
>>
>> my $in = Bio::SeqIO->new(-format=>"Fasta", -fh => \*DATA);
>> while(my $s = $in->next_seq()){
>>    print "seq is " . $s->seq . "\n"
>> }
>>
>> __DATA__
>>> ct1
>> AGAGAGAGA
>>> ctg2
>> ATATATAT
>>
>> and it does this when I run it:
>>
>> (alacrity)[18:22:58]haiyan>>perl -Ilocal/lib/perl5 foo.pl
>> seq is AGAGAGAGA
>> seq is ATATATAT
>> (alacrity)[18:24:42]haiyan>>
>>
>> Can you get the same series of things to work?
>>
>> If you’re doing a bunch of this kind of stuff, you might want to look
>> at Data::Section; rjbs discusses it
>> here. It’s warmer and fuzzier than dealing with the DATA handle by
>> yourself.
>>
>> g.
>>
>> ​
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
Thanks all reply and advice first.

I want to handle a single can handle data provided in two ways:

1)Read data from a file, such as "perl fastaLen.pl try.fa". No problem
for "-fh=>$filename".

2)Read data from output of cmmand, such as "less try.fa | perl
fastaLen.pl". No probelm when using "-fh=>\*STDIN".


And, I can do this by first read data through <>, and write into a tmp
file, and
creat instance of Bio::SeqIO with "-fh=>\$tmpFile", then remove the tmp
file.  I'm looking for a way to avoiding writing/removing tmp file.

Thanks.

Haiyan






On Sun, 2014-07-13 at 15:12 +0000, Fields, Christopher J wrote:

> Haiyan,
>
> Do you want a test script that uses the DATA handle or a script that can pull in data from STDIN (e.g. from outside)?  They are not the same.  It’s possible to do both but I think you’re conflating purposes here; someone using this script might not expect it to behave both ways.
>
> chris
>
> On Jul 12, 2014, at 9:23 PM, Haiyan Lin <[hidden email]> wrote:
>
> > Hi, geogre,
> >
> > Thanks your code is working for data embedded in code with "__DATA__".
> > But, it failed to hand data from outside. Following is the data and
> > result.
> >
> > [linhy@bioinfo1 Script]$ more try.fa
> >> Contig000001
> > CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
> >> hello
> > ATATATTTTT
> > [linhy@bioinfo1 Script]$ cat try.fa | perl try.pl
> > seq is AGAGAGAGA
> > seq is ATATATAT
> >
> > Thanks
> >
> > Haiyan
> >
> >
> >
> > On Sat, 2014-07-12 at 18:36 -0700, george hartzell wrote:
> >> I just did this on my Mac OS X 10.9.4 system with perl 5.18.2:
> >>
> >> cd tmp
> >> mkdir haiyan
> >> cd haiyan
> >> cpanm -n -L local Bio::SeqIO
> >>
> >> perl -Ilocal/lib/perl5 foo.pl
> >>
> >> With the following little program in foo.pl:
> >>
> >> use Bio::SeqIO ;
> >>
> >> my $in = Bio::SeqIO->new(-format=>"Fasta", -fh => \*DATA);
> >> while(my $s = $in->next_seq()){
> >>    print "seq is " . $s->seq . "\n"
> >> }
> >>
> >> __DATA__
> >>> ct1
> >> AGAGAGAGA
> >>> ctg2
> >> ATATATAT
> >>
> >> and it does this when I run it:
> >>
> >> (alacrity)[18:22:58]haiyan>>perl -Ilocal/lib/perl5 foo.pl
> >> seq is AGAGAGAGA
> >> seq is ATATATAT
> >> (alacrity)[18:24:42]haiyan>>
> >>
> >> Can you get the same series of things to work?
> >>
> >> If you’re doing a bunch of this kind of stuff, you might want to look
> >> at Data::Section; rjbs discusses it
> >> here. It’s warmer and fuzzier than dealing with the DATA handle by
> >> yourself.
> >>
> >> g.
> >>
> >> ​
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > [hidden email]
> > http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Paul Cantalupo
Hi Haiyan,

Ah your specifications are now clearer! I use the '-p' file test to check if STDIN is a named pipe (http://perldoc.perl.org/functions/-X.html) to do the same thing that you are trying to do:

use strict;
use warnings;
use Bio::SeqIO;
my $in;
if (-p STDIN) {
  $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fasta');
}
else {
  $in = Bio::SeqIO->new(-file => shift);
}
while (my $seq = $in->next_seq) {
  print $seq->length,"\n";
}


Here is my input file and results when running both ways:

$ cat foo.fa
>foo1
tgtagtc
>foo2
taaaacgtgtcat

$ cat foo.fa | both.pl
7
13

$ both.pl foo.fa
7
13


Hope this helps,

Paul


P.S. Next time you ask a question, your reply to Chris Fields is exactly how you should ask a question. It will enable us to better help you   ;)






Paul Cantalupo
University of Pittsburgh


On Mon, Jul 14, 2014 at 1:26 AM, Haiyan Lin <[hidden email]> wrote:
Thanks all reply and advice first.

I want to handle a single can handle data provided in two ways:

1)Read data from a file, such as "perl fastaLen.pl try.fa". No problem
for "-fh=>$filename".

2)Read data from output of cmmand, such as "less try.fa | perl
fastaLen.pl". No probelm when using "-fh=>\*STDIN".


And, I can do this by first read data through <>, and write into a tmp
file, and
creat instance of Bio::SeqIO with "-fh=>\$tmpFile", then remove the tmp
file.  I'm looking for a way to avoiding writing/removing tmp file.

Thanks.

Haiyan






On Sun, 2014-07-13 at 15:12 +0000, Fields, Christopher J wrote:
> Haiyan,
>
> Do you want a test script that uses the DATA handle or a script that can pull in data from STDIN (e.g. from outside)?  They are not the same.  It’s possible to do both but I think you’re conflating purposes here; someone using this script might not expect it to behave both ways.
>
> chris
>
> On Jul 12, 2014, at 9:23 PM, Haiyan Lin <[hidden email]> wrote:
>
> > Hi, geogre,
> >
> > Thanks your code is working for data embedded in code with "__DATA__".
> > But, it failed to hand data from outside. Following is the data and
> > result.
> >
> > [linhy@bioinfo1 Script]$ more try.fa
> >> Contig000001
> > CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
> >> hello
> > ATATATTTTT
> > [linhy@bioinfo1 Script]$ cat try.fa | perl try.pl
> > seq is AGAGAGAGA
> > seq is ATATATAT
> >
> > Thanks
> >
> > Haiyan
> >
> >
> >
> > On Sat, 2014-07-12 at 18:36 -0700, george hartzell wrote:
> >> I just did this on my Mac OS X 10.9.4 system with perl 5.18.2:
> >>
> >> cd tmp
> >> mkdir haiyan
> >> cd haiyan
> >> cpanm -n -L local Bio::SeqIO
> >>
> >> perl -Ilocal/lib/perl5 foo.pl
> >>
> >> With the following little program in foo.pl:
> >>
> >> use Bio::SeqIO ;
> >>
> >> my $in = Bio::SeqIO->new(-format=>"Fasta", -fh => \*DATA);
> >> while(my $s = $in->next_seq()){
> >>    print "seq is " . $s->seq . "\n"
> >> }
> >>
> >> __DATA__
> >>> ct1
> >> AGAGAGAGA
> >>> ctg2
> >> ATATATAT
> >>
> >> and it does this when I run it:
> >>
> >> (alacrity)[18:22:58]haiyan>>perl -Ilocal/lib/perl5 foo.pl
> >> seq is AGAGAGAGA
> >> seq is ATATATAT
> >> (alacrity)[18:24:42]haiyan>>
> >>
> >> Can you get the same series of things to work?
> >>
> >> If you’re doing a bunch of this kind of stuff, you might want to look
> >> at Data::Section; rjbs discusses it
> >> here. It’s warmer and fuzzier than dealing with the DATA handle by
> >> yourself.
> >>
> >> g.
> >>
> >> ​
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > [hidden email]
> > http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>




_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

George Hartzell-2
In reply to this post by Haiyan Lin

Haiyan Lin writes:
 > Thanks all reply and advice first.
 >
 > I want to handle a single can handle data provided in two ways:
 >
 > 1)Read data from a file, such as "perl fastaLen.pl try.fa". No problem
 > for "-fh=>$filename".
 >
 > 2)Read data from output of cmmand, such as "less try.fa | perl
 > fastaLen.pl". No probelm when using "-fh=>\*STDIN".
 >
 >
 > And, I can do this by first read data through <>, and write into a tmp
 > file, and
 > creat instance of Bio::SeqIO with "-fh=>\$tmpFile", then remove the tmp
 > file.  I'm looking for a way to avoiding writing/removing tmp file.

Paul suggested the use of -p to test whether STDIN is a pipe or not,
which stricly works to your specs.  I don't think it does what *I*
would expect if you redirect input from a file (foo.pl < moose.fa),
but that doesn't seem to be one of your requirements.

Alternatively, you could be more explicit about how it is supposed to
behave.  One thought would be to use an option on the command line to
specify the input file and use the name "-" to specify that program
should read from stdin.  There is a lot of unix tradition in that
pattern.

Alternatively^2, you could specify the name of the input file on the
command line (with an option or without) and if no name was specified
then read from stdin.

g.

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin
Hi, Gorge,

Thanks for your advice.

Best wishes!

Haiyan

On Mon, 2014-07-14 at 06:24 -0700, George Hartzell wrote:

> Haiyan Lin writes:
>  > Thanks all reply and advice first.
>  >
>  > I want to handle a single can handle data provided in two ways:
>  >
>  > 1)Read data from a file, such as "perl fastaLen.pl try.fa". No problem
>  > for "-fh=>$filename".
>  >
>  > 2)Read data from output of cmmand, such as "less try.fa | perl
>  > fastaLen.pl". No probelm when using "-fh=>\*STDIN".
>  >
>  >
>  > And, I can do this by first read data through <>, and write into a tmp
>  > file, and
>  > creat instance of Bio::SeqIO with "-fh=>\$tmpFile", then remove the tmp
>  > file.  I'm looking for a way to avoiding writing/removing tmp file.
>
> Paul suggested the use of -p to test whether STDIN is a pipe or not,
> which stricly works to your specs.  I don't think it does what *I*
> would expect if you redirect input from a file (foo.pl < moose.fa),
> but that doesn't seem to be one of your requirements.
>
> Alternatively, you could be more explicit about how it is supposed to
> behave.  One thought would be to use an option on the command line to
> specify the input file and use the name "-" to specify that program
> should read from stdin.  There is a lot of unix tradition in that
> pattern.
>
> Alternatively^2, you could specify the name of the input file on the
> command line (with an option or without) and if no name was specified
> then read from stdin.
>
> g.


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l