IUPAC code similarity

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

IUPAC code similarity

shalabh sharma
Hi All,
      I have few nucleotide sequences that are composed of IUPAC codes. Like
>test
VGSRVBSSSSSNSC

Similarly i have a database made of of these kind of sequences. I want to
find sequences that are 100% similar to the query sequence.

Is there any bioPerl module to deal with this, i tried normal blast but it
didn't worked.
Do i have to convert these sequences to 4 base codes or there is any other
way out.

Thanks
Shalabh
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

Aaron Mackey-2
Convert the IUPAC code to a regular expression, and use regular expressions
(in Perl or grep or similar) to find 100% identical matches.

-Aaron

On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
<[hidden email]>wrote:

> Hi All,
>      I have few nucleotide sequences that are composed of IUPAC codes. Like
> >test
> VGSRVBSSSSSNSC
>
> Similarly i have a database made of of these kind of sequences. I want to
> find sequences that are 100% similar to the query sequence.
>
> Is there any bioPerl module to deal with this, i tried normal blast but it
> didn't worked.
> Do i have to convert these sequences to 4 base codes or there is any other
> way out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

Roy Chaudhuri-3
Hi Shalabh,

The expand method in Bio::Tools::SeqPattern may be useful to convert
IUPAC codes to regular expressions:

$perl -e 'use Bio::Tools::SeqPattern; print
Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
[ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C

Although that won't work if there are also abiguity codes in your
database. For a non-BioPerl solution you could try fuzznuc from Emboss.

Cheers.
Roy.

On 17/09/2010 15:28, Aaron Mackey wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
> <[hidden email]>wrote:
>
>> Hi All,
>>       I have few nucleotide sequences that are composed of IUPAC codes. Like
>>> test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

shalabh sharma
In reply to this post by Aaron Mackey-2
Thanks Aaron for your reply.
Actually i tried that first, but there is another problem, i have to divide
each query sequence to window size 5 with 1 base shift and its not possible
to divide regular expression in that way.
So what i am trying is to convert those iupac codes to 4 base code sequence
and then do the normal search.
Now the problem is that i cant able to convert those IUPAC sequences to
normal ones, i am still trying to write a script but its taking time.

Thanks
Shalabh


On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <[hidden email]> wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <[hidden email]
> > wrote:
>
>> Hi All,
>>      I have few nucleotide sequences that are composed of IUPAC codes.
>> Like
>> >test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

Aaron Mackey-2
In reply to this post by Roy Chaudhuri-3
If there are ambi. codes in the database, then the expanded character class
has to also include the original ambiguity code; non-ambiguous nucleotides
must also be expanded to include all ambiguity codes that represent the
nucleotide.

-Aaron

On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri <[hidden email]>wrote:

> Hi Shalabh,
>
> The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC
> codes to regular expressions:
>
> $perl -e 'use Bio::Tools::SeqPattern; print
> Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
> [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C
>
> Although that won't work if there are also abiguity codes in your database.
> For a non-BioPerl solution you could try fuzznuc from Emboss.
>
> Cheers.
> Roy.
>
>
> On 17/09/2010 15:28, Aaron Mackey wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions
>> (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
>> <[hidden email]>wrote:
>>
>>  Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>>
>>>> test
>>>>
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

Aaron Mackey-2
In reply to this post by shalabh sharma
do your windowing/shifting on the unexpanded query sequences; then transform
the 5-bp queries into regular expressions.

-Aaron

On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma
<[hidden email]>wrote:

> Thanks Aaron for your reply.
> Actually i tried that first, but there is another problem, i have to divide
> each query sequence to window size 5 with 1 base shift and its not possible
> to divide regular expression in that way.
> So what i am trying is to convert those iupac codes to 4 base code sequence
> and then do the normal search.
> Now the problem is that i cant able to convert those IUPAC sequences to
> normal ones, i am still trying to write a script but its taking time.
>
> Thanks
> Shalabh
>
>
> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <[hidden email]>wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>> [hidden email]> wrote:
>>
>>> Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>> >test
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: IUPAC code similarity

shalabh sharma
Thanks Aaron,
changing the query sequence worked well but i am still struggling with the
database.

-Shalabh


On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey <[hidden email]> wrote:

> do your windowing/shifting on the unexpanded query sequences; then
> transform the 5-bp queries into regular expressions.
>
> -Aaron
>
>
> On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma <
> [hidden email]> wrote:
>
>> Thanks Aaron for your reply.
>> Actually i tried that first, but there is another problem, i have to
>> divide each query sequence to window size 5 with 1 base shift and its not
>> possible to divide regular expression in that way.
>> So what i am trying is to convert those iupac codes to 4 base code
>> sequence and then do the normal search.
>> Now the problem is that i cant able to convert those IUPAC sequences to
>> normal ones, i am still trying to write a script but its taking time.
>>
>> Thanks
>> Shalabh
>>
>>
>> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <[hidden email]>wrote:
>>
>>> Convert the IUPAC code to a regular expression, and use regular
>>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>>> [hidden email]> wrote:
>>>
>>>> Hi All,
>>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>>> Like
>>>> >test
>>>> VGSRVBSSSSSNSC
>>>>
>>>> Similarly i have a database made of of these kind of sequences. I want
>>>> to
>>>> find sequences that are 100% similar to the query sequence.
>>>>
>>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>>> it
>>>> didn't worked.
>>>> Do i have to convert these sequences to 4 base codes or there is any
>>>> other
>>>> way out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> [hidden email]
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l