Fwd: need BLAT parse code

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: need BLAT parse code

Jason Stajich


Begin forwarded message:

> From: neeti somaiya <[hidden email]>
> Date: November 29, 2005 1:27:27 AM EST
> To: Jason Stajich <[hidden email]>
> Subject: Re: [Bioperl-l] need BLAT parse code
>
> I use the following code :
>
> open(FH,"output.psl");
> while(<FH>)
> {
>     if( /^psLayout/ )
>     {
>           for( 1..4 ) { <> }
>       }
>      my @line = split;
>      my ( $matches,$mismatches,$rep_matches,$n_count,
>             $q_num_insert,$q_base_insert,
>             $t_num_insert, $t_base_insert,
>             $strand, $q_name, $q_length, $q_start,
>             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>             $block_sizes,  $q_starts,      $t_starts
>             ) = split;
>
>
>       print $t_start;
>       print "\n";
>       print $t_end;
>
> }
>
> for output.psl file :
>
> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap    
> strand  Q               Q       Q       Q       T                
> T       T       T       block   blockSizes      qStarts  tStarts
>         match   match           count   bases   count    
> bases           name            size    start   end      
> name            size    start   end     count
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -------------------
> 27025   0       0       0       0       0       0       0        
> +       query_sequence3 27025   0       27025    
> database_sequence3      57701691        132995  160020  1        
> 27025,  0,      132995,
> ~
>
>
> It gave me output :
>
> Q
> Q
>
> 132995
> 160020
>
> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>
> Please let me know.
> Thanks.
>
> On 11/28/05, Jason Stajich <[hidden email]> wrote:
> Bio::SearchIO::psl can parse psl output.
>
> or more simply:
>
> while(<>) {
>    if( /^psLayout/ ) { # if there is a header
>    for( 1..4 ) { <> }  # take next 4 lines to skip the header
>    }
>   my @line = split;
>   my ( $matches,$mismatches,$rep_matches,$n_count,
>              $q_num_insert,$q_base_insert,
>              $t_num_insert, $t_base_insert,
>              $strand, $q_name, $q_length, $q_start,
>              $q_end, $t_name, $t_length,$t_start, $t_end,  
> $block_count,
>              $block_sizes,  $q_starts,      $t_starts
>              ) = split;
>
>   #  query aln vals are  $q_start, and $q_end values
>   # hit aln vals are $t_start, $t_end
> }
>
> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > I am using BLAT in a project.I am having simple .psl output files
> > after
> > running BLAT of a gene sequences against full chromosomal
> > sequences.Doesanyone have a simple BLAT parse code. I am only
> > interested in obtaining the
> > alignment start and end positions on the target.
> > --
> > -Neeti
> > Even my blood says, B positive
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > [hidden email]
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
>
>
> --
> -Neeti
> Even my blood says, B positive

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: need BLAT parse code

Tamas Horvath
what kind of blast output format do u use?

On 11/29/05, Jason Stajich <[hidden email]> wrote:>>>> Begin forwarded message:>> > From: neeti somaiya <[hidden email]>> > Date: November 29, 2005 1:27:27 AM EST> > To: Jason Stajich <[hidden email]>> > Subject: Re: [Bioperl-l] need BLAT parse code> >> > I use the following code :> >> > open(FH,"output.psl");> > while(<FH>)> > {> >     if( /^psLayout/ )> >     {> >           for( 1..4 ) { <> }> >       }> >      my @line = split;> >      my ( $matches,$mismatches,$rep_matches,$n_count,> >             $q_num_insert,$q_base_insert,> >             $t_num_insert, $t_base_insert,> >             $strand, $q_name, $q_length, $q_start,> >             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,> >             $block_sizes,  $q_starts,      $t_starts> >             ) = split;> >> >> >       print $t_start;> >       print "\n";> >       print $t_end;> >> > }> >> > for output.psl file :> >> > match   mis-    rep.    N's     Q gap   Q gap   T gap  !
 T gap> > strand  Q               Q       Q       Q       T> > T       T       T       block   blockSizes      qStarts  tStarts> >         match   match           count   bases   count> > bases           name            size    start   end> > name            size    start   end     count> > ----------------------------------------------------------------------> > ----------------------------------------------------------------------> > -------------------> > 27025   0       0       0       0       0       0       0> > +       query_sequence3 27025   0       27025> > database_sequence3      57701691        132995  160020  1> > 27025,  0,      132995,> > ~> >> >> > It gave me output :> >> > Q> > Q> >> > 132995> > 160020> >> > What is the Q? Cant I obtain the coordinates (132995, 160020) alone?> >> > Please let me know.> > Thanks.> >> > On 11/28/05, Jason Stajich <[hidden email]> wrote:> > Bio::SearchIO::psl can parse psl output.> >> > or more simply:> >> > while(<>) {>!
 >    if( /^psLayout/ ) { # if there is a header> >    for( 1..4 ) { <> }  # take next 4 lines to skip the header> >    }> >   my @line = split;> >   my ( $matches,$mismatches,$rep_matches,$n_count,> >              $q_num_insert,$q_base_insert,> >              $t_num_insert, $t_base_insert,> >              $strand, $q_name, $q_length, $q_start,> >              $q_end, $t_name, $t_length,$t_start, $t_end,> > $block_count,> >              $block_sizes,  $q_starts,      $t_starts> >              ) = split;> >> >   #  query aln vals are  $q_start, and $q_end values> >   # hit aln vals are $t_start, $t_end> > }> >> > On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:> >> > > Hi,> > >> > > I am using BLAT in a project.I am having simple .psl output files> > > after> > > running BLAT of a gene sequences against full chromosomal> > > sequences.Doesanyone have a simple BLAT parse code. I am only> > > interested in obtaining the> > > alignment start and end positions on the target.> > !
> --> > > -Neeti> > > Even my blood says, B positive> > >> > > _______________________________________________> > > Bioperl-l mailing list> > > [hidden email]> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l> >> > --> > Jason Stajich> > Duke University> > http://www.duke.edu/~jes12> >> >> >> >> >> > --> > -Neeti> > Even my blood says, B positive>> --> Jason Stajich> Duke University> http://www.duke.edu/~jes12>>> _______________________________________________> Bioperl-l mailing list> [hidden email]> http://portal.open-bio.org/mailman/listinfo/bioperl-l>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: need BLAT parse code

Jason Stajich
In reply to this post by Jason Stajich
You probably need to change this line,
>  for( 1..4 ) { <> }
to
>  for( 1..4 ) { <FH> }

If your BLAT result file doesn't start with psLayout then you need to  
make move that if statement BEFORE the while(<FH>) loop starts, it is  
stripping off the header lines (and hence where the mysterious 'Q' is  
coming from in you output!).

You can also run blat to just return output without the header and  
you don't need those skipping steps at all.

 From the blat options:
  -noHead     suppress .psl header (so it's just a tab-separated file)


If parsing tab delimited files seems difficult go buy a book on  
programming perl or read some of the myriad of freely available  
online documentation and read about the split function.

-jason

On Nov 29, 2005, at 1:27 AM, neeti somaiya wrote:

> I use the following code :
>
> open(FH,"output.psl");
> while(<FH>)
> {
>     if( /^psLayout/ )
>     {
>           for( 1..4 ) { <FH> }
>       }
>      my @line = split;
>      my ( $matches,$mismatches,$rep_matches,$n_count,
>             $q_num_insert,$q_base_insert,
>             $t_num_insert, $t_base_insert,
>             $strand, $q_name, $q_length, $q_start,
>             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>             $block_sizes,  $q_starts,      $t_starts
>             ) = split;
>
>
>       print $t_start;
>       print "\n";
>       print $t_end;
>
> }
>
> for output.psl file :
>
> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap    
> strand  Q               Q       Q       Q       T                
> T       T       T       block   blockSizes      qStarts  tStarts
>         match   match           count   bases   count    
> bases           name            size    start   end      
> name            size    start   end     count
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -------------------
> 27025   0       0       0       0       0       0       0        
> +       query_sequence3 27025   0       27025    
> database_sequence3      57701691        132995  160020  1        
> 27025,  0,      132995,
> ~
>
>
> It gave me output :
>
> Q
> Q
>
> 132995
> 160020
>
> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>
> Please let me know.
> Thanks.
>
> On 11/28/05, Jason Stajich <[hidden email]> wrote:
> Bio::SearchIO::psl can parse psl output.
>
> or more simply:
>
> while(<>) {
>    if( /^psLayout/ ) { # if there is a header
>    for( 1..4 ) { <> }  # take next 4 lines to skip the header
>    }
>   my @line = split;
>   my ( $matches,$mismatches,$rep_matches,$n_count,
>              $q_num_insert,$q_base_insert,
>              $t_num_insert, $t_base_insert,
>              $strand, $q_name, $q_length, $q_start,
>              $q_end, $t_name, $t_length,$t_start, $t_end,  
> $block_count,
>              $block_sizes,  $q_starts,      $t_starts
>              ) = split;
>
>   #  query aln vals are  $q_start, and $q_end values
>   # hit aln vals are $t_start, $t_end
> }
>
> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > I am using BLAT in a project.I am having simple .psl output files
> > after
> > running BLAT of a gene sequences against full chromosomal
> > sequences.Doesanyone have a simple BLAT parse code. I am only
> > interested in obtaining the
> > alignment start and end positions on the target.
> > --
> > -Neeti
> > Even my blood says, B positive
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > [hidden email]
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
>
>
> --
> -Neeti
> Even my blood says, B positive

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: need BLAT parse code

Sean Davis-3
In reply to this post by Jason Stajich
Neeti,

You could simply put the file in a text editor and take out the header.
Alternatively, type:

 blat

without any arguments.  You will notice that there are many options to blat,
one of which is -noHead, which suppresses the header.  Or, look at only
lines that begin with a number using a regular expression.

Ultimately, I think that it will serve you well to read a perl book, though,
as parsing a text file is an important and basic topic to grasp if you want
to use perl for data analysis.

Sean

On 11/29/05 9:43 AM, "Jason Stajich" <[hidden email]> wrote:

>
>
> Begin forwarded message:
>
>> From: neeti somaiya <[hidden email]>
>> Date: November 29, 2005 1:27:27 AM EST
>> To: Jason Stajich <[hidden email]>
>> Subject: Re: [Bioperl-l] need BLAT parse code
>>
>> I use the following code :
>>
>> open(FH,"output.psl");
>> while(<FH>)
>> {
>>     if( /^psLayout/ )
>>     {
>>           for( 1..4 ) { <> }
>>       }
>>      my @line = split;
>>      my ( $matches,$mismatches,$rep_matches,$n_count,
>>             $q_num_insert,$q_base_insert,
>>             $t_num_insert, $t_base_insert,
>>             $strand, $q_name, $q_length, $q_start,
>>             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>>             $block_sizes,  $q_starts,      $t_starts
>>             ) = split;
>>
>>
>>       print $t_start;
>>       print "\n";
>>       print $t_end;
>>
>> }
>>
>> for output.psl file :
>>
>> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
>> strand  Q               Q       Q       Q       T
>> T       T       T       block   blockSizes      qStarts  tStarts
>>         match   match           count   bases   count
>> bases           name            size    start   end
>> name            size    start   end     count
>> ----------------------------------------------------------------------
>> ----------------------------------------------------------------------
>> -------------------
>> 27025   0       0       0       0       0       0       0
>> +       query_sequence3 27025   0       27025
>> database_sequence3      57701691        132995  160020  1
>> 27025,  0,      132995,
>> ~
>>
>>
>> It gave me output :
>>
>> Q
>> Q
>>
>> 132995
>> 160020
>>
>> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>>
>> Please let me know.
>> Thanks.
>>
>> On 11/28/05, Jason Stajich <[hidden email]> wrote:
>> Bio::SearchIO::psl can parse psl output.
>>
>> or more simply:
>>
>> while(<>) {
>>    if( /^psLayout/ ) { # if there is a header
>>    for( 1..4 ) { <> }  # take next 4 lines to skip the header
>>    }
>>   my @line = split;
>>   my ( $matches,$mismatches,$rep_matches,$n_count,
>>              $q_num_insert,$q_base_insert,
>>              $t_num_insert, $t_base_insert,
>>              $strand, $q_name, $q_length, $q_start,
>>              $q_end, $t_name, $t_length,$t_start, $t_end,
>> $block_count,
>>              $block_sizes,  $q_starts,      $t_starts
>>              ) = split;
>>
>>   #  query aln vals are  $q_start, and $q_end values
>>   # hit aln vals are $t_start, $t_end
>> }
>>
>> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>>
>>> Hi,
>>>
>>> I am using BLAT in a project.I am having simple .psl output files
>>> after
>>> running BLAT of a gene sequences against full chromosomal
>>> sequences.Doesanyone have a simple BLAT parse code. I am only
>>> interested in obtaining the
>>> alignment start and end positions on the target.
>>> --
>>> -Neeti
>>> Even my blood says, B positive
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: need BLAT parse code

Henk van den Toorn
Hello Neeti,

Using the Bio::SearchIO module for this kind of output is just what Bioperl
is for!

Try the following code, for every hit you will get a result object. Because
psl can have several hits for each result (different fragments of the same
result), you will get several 'hit' objects for each result.
It is not the most beautiful code, but it shows the point.

I think it is easier this way than parsing the file, because you don't have
to be bothered with which column has which meaning. Moreover it handles the
different hits per query result, which you would otherwise have to do
yourself.

Greetings, Henk
---- snip ----

#!/usr/bin/perl

use strict;
use warnings;
use Bio::SearchIO;
use Error qw(:try);

if (!defined $ARGV[0] || !-e $ARGV[0])  # Test if you have added a filename
at the command line
{
    die ("Usage: $0 filename.psl\n");
}

my $pslfile = $ARGV[0];

my $parser = new Bio::SearchIO(-file   => $pslfile, -format => 'psl');
my $result;

while ($result = $parser->next_result())
{
    print $result->query_name()."\t";
    print $result->query_length()."\t";
    print $result->num_hits()."\n";

    my $matches;
    foreach my $hit ($result->hits())
    {
        print "\t".$hit->name()."\t";

        try
        {
            $matches = $hit->matches('identities');
        }
        catch Bio::Root::Exception with
        {
            no warnings 'all';
            $matches = "ERROR";
        };


        print $matches."\t";

        print $hit->score()."\n";
    }
}

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Sean Davis
Sent: 29 November 2005 18:19
To: bioperl-ml List
Subject: Re: [Bioperl-l] need BLAT parse code

Neeti,

You could simply put the file in a text editor and take out the header.
Alternatively, type:

 blat

without any arguments.  You will notice that there are many options to blat,
one of which is -noHead, which suppresses the header.  Or, look at only
lines that begin with a number using a regular expression.

Ultimately, I think that it will serve you well to read a perl book, though,
as parsing a text file is an important and basic topic to grasp if you want
to use perl for data analysis.

Sean

On 11/29/05 9:43 AM, "Jason Stajich" <[hidden email]> wrote:

>
>
> Begin forwarded message:
>
>> From: neeti somaiya <[hidden email]>
>> Date: November 29, 2005 1:27:27 AM EST
>> To: Jason Stajich <[hidden email]>
>> Subject: Re: [Bioperl-l] need BLAT parse code
>>
>> I use the following code :
>>
>> open(FH,"output.psl");
>> while(<FH>)
>> {
>>     if( /^psLayout/ )
>>     {
>>           for( 1..4 ) { <> }
>>       }
>>      my @line = split;
>>      my ( $matches,$mismatches,$rep_matches,$n_count,
>>             $q_num_insert,$q_base_insert,
>>             $t_num_insert, $t_base_insert,
>>             $strand, $q_name, $q_length, $q_start,
>>             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>>             $block_sizes,  $q_starts,      $t_starts
>>             ) = split;
>>
>>
>>       print $t_start;
>>       print "\n";
>>       print $t_end;
>>
>> }
>>
>> for output.psl file :
>>
>> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
>> strand  Q               Q       Q       Q       T
>> T       T       T       block   blockSizes      qStarts  tStarts
>>         match   match           count   bases   count
>> bases           name            size    start   end
>> name            size    start   end     count
>> ---------------------------------------------------------------------
>> -
>> ---------------------------------------------------------------------
>> -
>> -------------------
>> 27025   0       0       0       0       0       0       0
>> +       query_sequence3 27025   0       27025
>> database_sequence3      57701691        132995  160020  1
>> 27025,  0,      132995,
>> ~
>>
>>
>> It gave me output :
>>
>> Q
>> Q
>>
>> 132995
>> 160020
>>
>> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>>
>> Please let me know.
>> Thanks.
>>
>> On 11/28/05, Jason Stajich <[hidden email]> wrote:
>> Bio::SearchIO::psl can parse psl output.
>>
>> or more simply:
>>
>> while(<>) {
>>    if( /^psLayout/ ) { # if there is a header
>>    for( 1..4 ) { <> }  # take next 4 lines to skip the header
>>    }
>>   my @line = split;
>>   my ( $matches,$mismatches,$rep_matches,$n_count,
>>              $q_num_insert,$q_base_insert,
>>              $t_num_insert, $t_base_insert,
>>              $strand, $q_name, $q_length, $q_start,
>>              $q_end, $t_name, $t_length,$t_start, $t_end,
>> $block_count,
>>              $block_sizes,  $q_starts,      $t_starts
>>              ) = split;
>>
>>   #  query aln vals are  $q_start, and $q_end values
>>   # hit aln vals are $t_start, $t_end }
>>
>> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>>
>>> Hi,
>>>
>>> I am using BLAT in a project.I am having simple .psl output files
>>> after running BLAT of a gene sequences against full chromosomal
>>> sequences.Doesanyone have a simple BLAT parse code. I am only
>>> interested in obtaining the alignment start and end positions on the
>>> target.
>>> --
>>> -Neeti
>>> Even my blood says, B positive
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Loading...