issue with Bio::Tools::Run::Primer3Redux

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

issue with Bio::Tools::Run::Primer3Redux

kumar Saurabh-2
Hi,

I need to integrate the primer3 module in one of our pipeline. In a process,
I was testing the initial code given on the CPAN website. But whenever I try
to run this program its giving me error...that "Cannot locate the Object
method add_target via the package Bio::Tools:Run::Primer3Redux...."

The line of codes I am using is as follows:

# design some primers.
  # the output will be put into temp.out
  use Bio::Tools::Primer3Redux;
  use Bio::Tools::Run::Primer3Redux;
  use Bio::SeqIO;

  my $seqio = Bio::SeqIO->new(-file=>'sample.dna');
  my $seq = $seqio->next_seq;

  my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out",
                                            -path =>
"/home/singh/Downloads/primer3-2.2.3/src/primer3_core");

  # or after the fact you can change the program_name
  $primer3->program_name('my_superfast_primer3');

  unless ($primer3->executable) {
    print STDERR "primer3 can not be found. Is it installed?\n";
    exit(-1)
  }

  # set the maximum and minimum Tm of the primer
  $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90);

  # Design the primers. This runs primer3 and returns a
  # Bio::Tools::Primer3::result object with the results
  # Primer3 can run in several modes (see explanation for
  # 'PRIMER_TASK' in the primer3 doccumentation). To run a task,
  # either call it by its PRIMER_TASK name as in these examples:
  $pcr_primer_results = $primer3->pick_pcr_primers($seq);
  $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq );
  $check_results = $primer3->check_primers();

  # Alternatively, explicitly set the PRIMER_TASK parameter and
  # use the generic 'run' method (this is mainly here for backwards
  # compatibility) :
  $primer3->PRIMER_TASK( 'pick_left_only' );
  $result = $primer3->run( $seq );

  # If no task is set and the 'run' method is called, primer3 will default
to
  # pick pcr primers.

  # see the Bio::Tools::Primer3Redux POD for
  # things that you can get from this. For example:

  print "There were ", $results->num_primer_pairs, " primer pairs\n";


Can anyone help me with this???


Best regards,
Kumar
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: issue with Bio::Tools::Run::Primer3Redux

Frank Schwach
Hi Kumar,

We are currently working on this module and you might want to check out
the latest version on Chris Field's github project:

https://github.com/cjfields/Bio-Tools-Primer3Redux

There will probably be some changes again once I get some time again to
work on a few points we discussed lately. You can also check out my repo
here:
https://github.com/fschwach/Bio-Tools-Primer3Redux

but I will certainly have to make changes to that code because I used
AUTOLAD in the last version, which is probably not a good idea.
My recommendation would be to use Chris' repo and see if that works for
you. If not, feedback would be much appreciated.

Cheers,

Frank




On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote:

> Hi,
>
> I need to integrate the primer3 module in one of our pipeline. In a process,
> I was testing the initial code given on the CPAN website. But whenever I try
> to run this program its giving me error...that "Cannot locate the Object
> method add_target via the package Bio::Tools:Run::Primer3Redux...."
>
> The line of codes I am using is as follows:
>
> # design some primers.
>   # the output will be put into temp.out
>   use Bio::Tools::Primer3Redux;
>   use Bio::Tools::Run::Primer3Redux;
>   use Bio::SeqIO;
>
>   my $seqio = Bio::SeqIO->new(-file=>'sample.dna');
>   my $seq = $seqio->next_seq;
>
>   my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out",
>                                             -path =>
> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core");
>
>   # or after the fact you can change the program_name
>   $primer3->program_name('my_superfast_primer3');
>
>   unless ($primer3->executable) {
>     print STDERR "primer3 can not be found. Is it installed?\n";
>     exit(-1)
>   }
>
>   # set the maximum and minimum Tm of the primer
>   $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90);
>
>   # Design the primers. This runs primer3 and returns a
>   # Bio::Tools::Primer3::result object with the results
>   # Primer3 can run in several modes (see explanation for
>   # 'PRIMER_TASK' in the primer3 doccumentation). To run a task,
>   # either call it by its PRIMER_TASK name as in these examples:
>   $pcr_primer_results = $primer3->pick_pcr_primers($seq);
>   $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq );
>   $check_results = $primer3->check_primers();
>
>   # Alternatively, explicitly set the PRIMER_TASK parameter and
>   # use the generic 'run' method (this is mainly here for backwards
>   # compatibility) :
>   $primer3->PRIMER_TASK( 'pick_left_only' );
>   $result = $primer3->run( $seq );
>
>   # If no task is set and the 'run' method is called, primer3 will default
> to
>   # pick pcr primers.
>
>   # see the Bio::Tools::Primer3Redux POD for
>   # things that you can get from this. For example:
>
>   print "There were ", $results->num_primer_pairs, " primer pairs\n";
>
>
> Can anyone help me with this???
>
>
> Best regards,
> Kumar
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: issue with Bio::Tools::Run::Primer3Redux

Fields, Christopher J
I mentioned off-list that this should be filed as a github issue so we don't lose track.  Unfortunately I can't get to it until next week (grant deadline).

chris

On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote:

> Hi Kumar,
>
> We are currently working on this module and you might want to check out
> the latest version on Chris Field's github project:
>
> https://github.com/cjfields/Bio-Tools-Primer3Redux
>
> There will probably be some changes again once I get some time again to
> work on a few points we discussed lately. You can also check out my repo
> here:
> https://github.com/fschwach/Bio-Tools-Primer3Redux
>
> but I will certainly have to make changes to that code because I used
> AUTOLAD in the last version, which is probably not a good idea.
> My recommendation would be to use Chris' repo and see if that works for
> you. If not, feedback would be much appreciated.
>
> Cheers,
>
> Frank
>
>
>
>
> On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote:
>> Hi,
>>
>> I need to integrate the primer3 module in one of our pipeline. In a process,
>> I was testing the initial code given on the CPAN website. But whenever I try
>> to run this program its giving me error...that "Cannot locate the Object
>> method add_target via the package Bio::Tools:Run::Primer3Redux...."
>>
>> The line of codes I am using is as follows:
>>
>> # design some primers.
>>  # the output will be put into temp.out
>>  use Bio::Tools::Primer3Redux;
>>  use Bio::Tools::Run::Primer3Redux;
>>  use Bio::SeqIO;
>>
>>  my $seqio = Bio::SeqIO->new(-file=>'sample.dna');
>>  my $seq = $seqio->next_seq;
>>
>>  my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out",
>>                                            -path =>
>> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core");
>>
>>  # or after the fact you can change the program_name
>>  $primer3->program_name('my_superfast_primer3');
>>
>>  unless ($primer3->executable) {
>>    print STDERR "primer3 can not be found. Is it installed?\n";
>>    exit(-1)
>>  }
>>
>>  # set the maximum and minimum Tm of the primer
>>  $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90);
>>
>>  # Design the primers. This runs primer3 and returns a
>>  # Bio::Tools::Primer3::result object with the results
>>  # Primer3 can run in several modes (see explanation for
>>  # 'PRIMER_TASK' in the primer3 doccumentation). To run a task,
>>  # either call it by its PRIMER_TASK name as in these examples:
>>  $pcr_primer_results = $primer3->pick_pcr_primers($seq);
>>  $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq );
>>  $check_results = $primer3->check_primers();
>>
>>  # Alternatively, explicitly set the PRIMER_TASK parameter and
>>  # use the generic 'run' method (this is mainly here for backwards
>>  # compatibility) :
>>  $primer3->PRIMER_TASK( 'pick_left_only' );
>>  $result = $primer3->run( $seq );
>>
>>  # If no task is set and the 'run' method is called, primer3 will default
>> to
>>  # pick pcr primers.
>>
>>  # see the Bio::Tools::Primer3Redux POD for
>>  # things that you can get from this. For example:
>>
>>  print "There were ", $results->num_primer_pairs, " primer pairs\n";
>>
>>
>> Can anyone help me with this???
>>
>>
>> Best regards,
>> Kumar
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: issue with Bio::Tools::Run::Primer3Redux

Frank Schwach
I also haven't had the time yet to work on this again but, yes, we need
to make sure we don't loose track of where we are.


On Thu, 2011-09-15 at 14:13 +0000, Fields, Christopher J wrote:

> I mentioned off-list that this should be filed as a github issue so we don't lose track.  Unfortunately I can't get to it until next week (grant deadline).
>
> chris
>
> On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote:
>
> > Hi Kumar,
> >
> > We are currently working on this module and you might want to check out
> > the latest version on Chris Field's github project:
> >
> > https://github.com/cjfields/Bio-Tools-Primer3Redux
> >
> > There will probably be some changes again once I get some time again to
> > work on a few points we discussed lately. You can also check out my repo
> > here:
> > https://github.com/fschwach/Bio-Tools-Primer3Redux
> >
> > but I will certainly have to make changes to that code because I used
> > AUTOLAD in the last version, which is probably not a good idea.
> > My recommendation would be to use Chris' repo and see if that works for
> > you. If not, feedback would be much appreciated.
> >
> > Cheers,
> >
> > Frank
> >
> >
> >
> >
> > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote:
> >> Hi,
> >>
> >> I need to integrate the primer3 module in one of our pipeline. In a process,
> >> I was testing the initial code given on the CPAN website. But whenever I try
> >> to run this program its giving me error...that "Cannot locate the Object
> >> method add_target via the package Bio::Tools:Run::Primer3Redux...."
> >>
> >> The line of codes I am using is as follows:
> >>
> >> # design some primers.
> >>  # the output will be put into temp.out
> >>  use Bio::Tools::Primer3Redux;
> >>  use Bio::Tools::Run::Primer3Redux;
> >>  use Bio::SeqIO;
> >>
> >>  my $seqio = Bio::SeqIO->new(-file=>'sample.dna');
> >>  my $seq = $seqio->next_seq;
> >>
> >>  my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out",
> >>                                            -path =>
> >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core");
> >>
> >>  # or after the fact you can change the program_name
> >>  $primer3->program_name('my_superfast_primer3');
> >>
> >>  unless ($primer3->executable) {
> >>    print STDERR "primer3 can not be found. Is it installed?\n";
> >>    exit(-1)
> >>  }
> >>
> >>  # set the maximum and minimum Tm of the primer
> >>  $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90);
> >>
> >>  # Design the primers. This runs primer3 and returns a
> >>  # Bio::Tools::Primer3::result object with the results
> >>  # Primer3 can run in several modes (see explanation for
> >>  # 'PRIMER_TASK' in the primer3 doccumentation). To run a task,
> >>  # either call it by its PRIMER_TASK name as in these examples:
> >>  $pcr_primer_results = $primer3->pick_pcr_primers($seq);
> >>  $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq );
> >>  $check_results = $primer3->check_primers();
> >>
> >>  # Alternatively, explicitly set the PRIMER_TASK parameter and
> >>  # use the generic 'run' method (this is mainly here for backwards
> >>  # compatibility) :
> >>  $primer3->PRIMER_TASK( 'pick_left_only' );
> >>  $result = $primer3->run( $seq );
> >>
> >>  # If no task is set and the 'run' method is called, primer3 will default
> >> to
> >>  # pick pcr primers.
> >>
> >>  # see the Bio::Tools::Primer3Redux POD for
> >>  # things that you can get from this. For example:
> >>
> >>  print "There were ", $results->num_primer_pairs, " primer pairs\n";
> >>
> >>
> >> Can anyone help me with this???
> >>
> >>
> >> Best regards,
> >> Kumar
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> [hidden email]
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > [hidden email]
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
In reply to this post by Fields, Christopher J
Hi all,

I needed to manipulate Bio::Seq objects with annotations and sequence
features to simulate molecular cloning techniques, e.g. to cut a vector
and insert a fragment into it while preserving all the annotations and
moving the features accordingly.
My main aim was to split features that span deletion/insertion sites in
a meaningful way, which can not be done with the currently availble
methods.
I have modified Bio::SeqUtils so that I have the following new methods:

delete
======
removes a segment from a sequence object and adjusts positions and types
of locations of sequence features:
- locations of features that span the deletion sites are turned into
Splits.
- locations that extend into the deleted region are turned to Fuzzy to
indicate that their true start/end was lost.
- locations contained inside the deleted regions are lost.
- other features are shifted according to the length of the deletion.

insert
======
adds a Bio::Seq object into another one between specified insertion
sites. This also affects the features on the recipient sequence:
- locations of features that span the insertion site are split but
position types are not turned to Fuzzy because no part of the original
feature is lost.
- other features are shifted according to the length of the insertion.

ligate
======
just for convenience. Supply a recipient, a fragment and one or two
sites to cut the recipient. Can also flip the fragment if required.
Simply calls delete [, reverse_complement_with_features] and insert in
turn.


One situation I haven't handled yet is a deletion that spans the origin
of a circular molecule but that should be a rare thing to do anyway. The
code currently throws an error if this is attempted.

I'm happy to contribute the code on Github if there is interest?
Comments on the handling of feature locations highly welcome!

Frank





--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Fields, Christopher J
Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).

chris

On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:

> Hi all,
>
> I needed to manipulate Bio::Seq objects with annotations and sequence
> features to simulate molecular cloning techniques, e.g. to cut a vector
> and insert a fragment into it while preserving all the annotations and
> moving the features accordingly.
> My main aim was to split features that span deletion/insertion sites in
> a meaningful way, which can not be done with the currently availble
> methods.
> I have modified Bio::SeqUtils so that I have the following new methods:
>
> delete
> ======
> removes a segment from a sequence object and adjusts positions and types
> of locations of sequence features:
> - locations of features that span the deletion sites are turned into
> Splits.
> - locations that extend into the deleted region are turned to Fuzzy to
> indicate that their true start/end was lost.
> - locations contained inside the deleted regions are lost.
> - other features are shifted according to the length of the deletion.
>
> insert
> ======
> adds a Bio::Seq object into another one between specified insertion
> sites. This also affects the features on the recipient sequence:
> - locations of features that span the insertion site are split but
> position types are not turned to Fuzzy because no part of the original
> feature is lost.
> - other features are shifted according to the length of the insertion.
>
> ligate
> ======
> just for convenience. Supply a recipient, a fragment and one or two
> sites to cut the recipient. Can also flip the fragment if required.
> Simply calls delete [, reverse_complement_with_features] and insert in
> turn.
>
>
> One situation I haven't handled yet is a deletion that spans the origin
> of a circular molecule but that should be a rare thing to do anyway. The
> code currently throws an error if this is attempted.
>
> I'm happy to contribute the code on Github if there is interest?
> Comments on the handling of feature locations highly welcome!
>
> Frank





_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
Hi Chris,

I have made the changes in a Git fork and made the pull request now.
If this is accepted into BioPerl I can also write a little SeqUtils
HOWTO for the BioPerl wiki.

Frank


On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:

> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>
> chris
>
> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>
> > Hi all,
> >
> > I needed to manipulate Bio::Seq objects with annotations and sequence
> > features to simulate molecular cloning techniques, e.g. to cut a vector
> > and insert a fragment into it while preserving all the annotations and
> > moving the features accordingly.
> > My main aim was to split features that span deletion/insertion sites in
> > a meaningful way, which can not be done with the currently availble
> > methods.
> > I have modified Bio::SeqUtils so that I have the following new methods:
> >
> > delete
> > ======
> > removes a segment from a sequence object and adjusts positions and types
> > of locations of sequence features:
> > - locations of features that span the deletion sites are turned into
> > Splits.
> > - locations that extend into the deleted region are turned to Fuzzy to
> > indicate that their true start/end was lost.
> > - locations contained inside the deleted regions are lost.
> > - other features are shifted according to the length of the deletion.
> >
> > insert
> > ======
> > adds a Bio::Seq object into another one between specified insertion
> > sites. This also affects the features on the recipient sequence:
> > - locations of features that span the insertion site are split but
> > position types are not turned to Fuzzy because no part of the original
> > feature is lost.
> > - other features are shifted according to the length of the insertion.
> >
> > ligate
> > ======
> > just for convenience. Supply a recipient, a fragment and one or two
> > sites to cut the recipient. Can also flip the fragment if required.
> > Simply calls delete [, reverse_complement_with_features] and insert in
> > turn.
> >
> >
> > One situation I haven't handled yet is a deletion that spans the origin
> > of a circular molecule but that should be a rare thing to do anyway. The
> > code currently throws an error if this is attempted.
> >
> > I'm happy to contribute the code on Github if there is interest?
> > Comments on the handling of feature locations highly welcome!
> >
> > Frank
>
>
>
>



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri-3
Hi Frank,

Looks good to me. One thing I'm not sure about - why do features
overlapping a deletion become fuzzy? That behaviour is in
trunc_with_features because it's intended to represent a taking a
subregion of a larger sequence, but if you're representing an internal
deletion then the boundaries of the overlapping feature aren't unknown,
they have been specifically altered. Maybe you could give absolute
coordinates, but add a note indicating that the 5' or 3' end has been
truncated by however many bases.

Cheers,
Roy.

On 10/01/2012 13:10, Frank Schwach wrote:

> Hi Chris,
>
> I have made the changes in a Git fork and made the pull request now.
> If this is accepted into BioPerl I can also write a little SeqUtils
> HOWTO for the BioPerl wiki.
>
> Frank
>
>
> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>
>> chris
>>
>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>
>>> Hi all,
>>>
>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>> and insert a fragment into it while preserving all the annotations and
>>> moving the features accordingly.
>>> My main aim was to split features that span deletion/insertion sites in
>>> a meaningful way, which can not be done with the currently availble
>>> methods.
>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>
>>> delete
>>> ======
>>> removes a segment from a sequence object and adjusts positions and types
>>> of locations of sequence features:
>>> - locations of features that span the deletion sites are turned into
>>> Splits.
>>> - locations that extend into the deleted region are turned to Fuzzy to
>>> indicate that their true start/end was lost.
>>> - locations contained inside the deleted regions are lost.
>>> - other features are shifted according to the length of the deletion.
>>>
>>> insert
>>> ======
>>> adds a Bio::Seq object into another one between specified insertion
>>> sites. This also affects the features on the recipient sequence:
>>> - locations of features that span the insertion site are split but
>>> position types are not turned to Fuzzy because no part of the original
>>> feature is lost.
>>> - other features are shifted according to the length of the insertion.
>>>
>>> ligate
>>> ======
>>> just for convenience. Supply a recipient, a fragment and one or two
>>> sites to cut the recipient. Can also flip the fragment if required.
>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>> turn.
>>>
>>>
>>> One situation I haven't handled yet is a deletion that spans the origin
>>> of a circular molecule but that should be a rare thing to do anyway. The
>>> code currently throws an error if this is attempted.
>>>
>>> I'm happy to contribute the code on Github if there is interest?
>>> Comments on the handling of feature locations highly welcome!
>>>
>>> Frank
>>
>>
>>
>>
>
>
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
Hi Roy,

Sorry, I hadn't explained that very well: it's not the outer boundaries
of the feature that become fuzzy but the "inner" ones of the split
locations:

 --------------------           a feature's location
==========xxxx================= sequence


 ---------                     sublocation 1
          --------             sublocation 2
===============================
         
x= sequence to delete
The feature's location has changed from Simple to Split.

Sublocation 1:
start is still EXACT and has not changed
end is now AFTER because this is not a true end of the feature

Sublocation 2:
start is BEFORE
end is EXACT (but shifted)

I hope this makes more sense(?)

Cheers,

Frank



On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:

> Hi Frank,
>
> Looks good to me. One thing I'm not sure about - why do features
> overlapping a deletion become fuzzy? That behaviour is in
> trunc_with_features because it's intended to represent a taking a
> subregion of a larger sequence, but if you're representing an internal
> deletion then the boundaries of the overlapping feature aren't unknown,
> they have been specifically altered. Maybe you could give absolute
> coordinates, but add a note indicating that the 5' or 3' end has been
> truncated by however many bases.
>
> Cheers,
> Roy.
>
> On 10/01/2012 13:10, Frank Schwach wrote:
> > Hi Chris,
> >
> > I have made the changes in a Git fork and made the pull request now.
> > If this is accepted into BioPerl I can also write a little SeqUtils
> > HOWTO for the BioPerl wiki.
> >
> > Frank
> >
> >
> > On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
> >> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
> >>
> >> chris
> >>
> >> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
> >>
> >>> Hi all,
> >>>
> >>> I needed to manipulate Bio::Seq objects with annotations and sequence
> >>> features to simulate molecular cloning techniques, e.g. to cut a vector
> >>> and insert a fragment into it while preserving all the annotations and
> >>> moving the features accordingly.
> >>> My main aim was to split features that span deletion/insertion sites in
> >>> a meaningful way, which can not be done with the currently availble
> >>> methods.
> >>> I have modified Bio::SeqUtils so that I have the following new methods:
> >>>
> >>> delete
> >>> ======
> >>> removes a segment from a sequence object and adjusts positions and types
> >>> of locations of sequence features:
> >>> - locations of features that span the deletion sites are turned into
> >>> Splits.
> >>> - locations that extend into the deleted region are turned to Fuzzy to
> >>> indicate that their true start/end was lost.
> >>> - locations contained inside the deleted regions are lost.
> >>> - other features are shifted according to the length of the deletion.
> >>>
> >>> insert
> >>> ======
> >>> adds a Bio::Seq object into another one between specified insertion
> >>> sites. This also affects the features on the recipient sequence:
> >>> - locations of features that span the insertion site are split but
> >>> position types are not turned to Fuzzy because no part of the original
> >>> feature is lost.
> >>> - other features are shifted according to the length of the insertion.
> >>>
> >>> ligate
> >>> ======
> >>> just for convenience. Supply a recipient, a fragment and one or two
> >>> sites to cut the recipient. Can also flip the fragment if required.
> >>> Simply calls delete [, reverse_complement_with_features] and insert in
> >>> turn.
> >>>
> >>>
> >>> One situation I haven't handled yet is a deletion that spans the origin
> >>> of a circular molecule but that should be a rare thing to do anyway. The
> >>> code currently throws an error if this is attempted.
> >>>
> >>> I'm happy to contribute the code on Github if there is interest?
> >>> Comments on the handling of feature locations highly welcome!
> >>>
> >>> Frank
> >>
> >>
> >>
> >>
> >
> >
> >
>



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri-3
I think it's me that didn't explain very well - I was talking about
overlapping (rather than spanning) a deletion, although I think the same
principle applies to the spanning example you gave. Here's some test code:

#!/usr/bin/perl
use warnings FATAL=>qw(all);
use strict;
use Bio::Seq;
use Bio::SeqIO;
use Bio::SeqUtils;
use Bio::SeqFeature::Generic;
my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
$seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
                                                    -start=>2,
                                                    -end=>9));

$seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
                                                    -start=>2,
                                                    -end=>5));
my $out=Bio::SeqIO->newFh(-format=>'genbank');
my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
print $out $trunc;


This currently outputs:
LOCUS       seq-accession_number            7 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
      CDS             join(2..>3,<4..6)
      CDS             2..>3
ORIGIN
         1 aaaaaaa
//

However, I was suggesting that the feature table should be something like:
CDS             join(2..3,4..6)
                 /note="3 bp internal deletion"
CDS             join(2..3)
                 /note="2 bp deleted from 3' end"

Fuzzy locations are intended to represent features which have boundaries
spanning outside of the sequence. For a defined deletion that's not the
case, the boundaries of the feature aren't unknown, they have been
specifically altered.

Hope this is clearer.
Cheers,
Roy.

On 10/01/2012 16:47, Frank Schwach wrote:

> Hi Roy,
>
> Sorry, I hadn't explained that very well: it's not the outer boundaries
> of the feature that become fuzzy but the "inner" ones of the split
> locations:
>
>   --------------------           a feature's location
> ==========xxxx================= sequence
>
>
>   ---------                     sublocation 1
>            --------             sublocation 2
> ===============================
>
> x= sequence to delete
> The feature's location has changed from Simple to Split.
>
> Sublocation 1:
> start is still EXACT and has not changed
> end is now AFTER because this is not a true end of the feature
>
> Sublocation 2:
> start is BEFORE
> end is EXACT (but shifted)
>
> I hope this makes more sense(?)
>
> Cheers,
>
> Frank
>
>
>
> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>> Hi Frank,
>>
>> Looks good to me. One thing I'm not sure about - why do features
>> overlapping a deletion become fuzzy? That behaviour is in
>> trunc_with_features because it's intended to represent a taking a
>> subregion of a larger sequence, but if you're representing an internal
>> deletion then the boundaries of the overlapping feature aren't unknown,
>> they have been specifically altered. Maybe you could give absolute
>> coordinates, but add a note indicating that the 5' or 3' end has been
>> truncated by however many bases.
>>
>> Cheers,
>> Roy.
>>
>> On 10/01/2012 13:10, Frank Schwach wrote:
>>> Hi Chris,
>>>
>>> I have made the changes in a Git fork and made the pull request now.
>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>> HOWTO for the BioPerl wiki.
>>>
>>> Frank
>>>
>>>
>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>>>
>>>> chris
>>>>
>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>>>> and insert a fragment into it while preserving all the annotations and
>>>>> moving the features accordingly.
>>>>> My main aim was to split features that span deletion/insertion sites in
>>>>> a meaningful way, which can not be done with the currently availble
>>>>> methods.
>>>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>>>
>>>>> delete
>>>>> ======
>>>>> removes a segment from a sequence object and adjusts positions and types
>>>>> of locations of sequence features:
>>>>> - locations of features that span the deletion sites are turned into
>>>>> Splits.
>>>>> - locations that extend into the deleted region are turned to Fuzzy to
>>>>> indicate that their true start/end was lost.
>>>>> - locations contained inside the deleted regions are lost.
>>>>> - other features are shifted according to the length of the deletion.
>>>>>
>>>>> insert
>>>>> ======
>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>> sites. This also affects the features on the recipient sequence:
>>>>> - locations of features that span the insertion site are split but
>>>>> position types are not turned to Fuzzy because no part of the original
>>>>> feature is lost.
>>>>> - other features are shifted according to the length of the insertion.
>>>>>
>>>>> ligate
>>>>> ======
>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>>>> turn.
>>>>>
>>>>>
>>>>> One situation I haven't handled yet is a deletion that spans the origin
>>>>> of a circular molecule but that should be a rare thing to do anyway. The
>>>>> code currently throws an error if this is attempted.
>>>>>
>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>> Comments on the handling of feature locations highly welcome!
>>>>>
>>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri-3
Or without the typo:

CDS             join(2..3,4..6)
                 /note="3 bp internal deletion"
CDS             2..3
                 /note="2 bp deleted from 3' end"

On 10/01/2012 17:27, Roy Chaudhuri wrote:

> I think it's me that didn't explain very well - I was talking about
> overlapping (rather than spanning) a deletion, although I think the same
> principle applies to the spanning example you gave. Here's some test code:
>
> #!/usr/bin/perl
> use warnings FATAL=>qw(all);
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::SeqUtils;
> use Bio::SeqFeature::Generic;
> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                      -start=>2,
>                                                      -end=>9));
>
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                      -start=>2,
>                                                      -end=>5));
> my $out=Bio::SeqIO->newFh(-format=>'genbank');
> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
> print $out $trunc;
>
>
> This currently outputs:
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>        CDS             join(2..>3,<4..6)
>        CDS             2..>3
> ORIGIN
>           1 aaaaaaa
> //
>
> However, I was suggesting that the feature table should be something like:
> CDS             join(2..3,4..6)
>                   /note="3 bp internal deletion"
> CDS             join(2..3)
>                   /note="2 bp deleted from 3' end"
>
> Fuzzy locations are intended to represent features which have boundaries
> spanning outside of the sequence. For a defined deletion that's not the
> case, the boundaries of the feature aren't unknown, they have been
> specifically altered.
>
> Hope this is clearer.
> Cheers,
> Roy.
>
> On 10/01/2012 16:47, Frank Schwach wrote:
>> Hi Roy,
>>
>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>> of the feature that become fuzzy but the "inner" ones of the split
>> locations:
>>
>>    --------------------           a feature's location
>> ==========xxxx================= sequence
>>
>>
>>    ---------                     sublocation 1
>>             --------             sublocation 2
>> ===============================
>>
>> x= sequence to delete
>> The feature's location has changed from Simple to Split.
>>
>> Sublocation 1:
>> start is still EXACT and has not changed
>> end is now AFTER because this is not a true end of the feature
>>
>> Sublocation 2:
>> start is BEFORE
>> end is EXACT (but shifted)
>>
>> I hope this makes more sense(?)
>>
>> Cheers,
>>
>> Frank
>>
>>
>>
>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>> Hi Frank,
>>>
>>> Looks good to me. One thing I'm not sure about - why do features
>>> overlapping a deletion become fuzzy? That behaviour is in
>>> trunc_with_features because it's intended to represent a taking a
>>> subregion of a larger sequence, but if you're representing an internal
>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>> they have been specifically altered. Maybe you could give absolute
>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>> truncated by however many bases.
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>> Hi Chris,
>>>>
>>>> I have made the changes in a Git fork and made the pull request now.
>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>> HOWTO for the BioPerl wiki.
>>>>
>>>> Frank
>>>>
>>>>
>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>>>>> and insert a fragment into it while preserving all the annotations and
>>>>>> moving the features accordingly.
>>>>>> My main aim was to split features that span deletion/insertion sites in
>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>> methods.
>>>>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>>>>
>>>>>> delete
>>>>>> ======
>>>>>> removes a segment from a sequence object and adjusts positions and types
>>>>>> of locations of sequence features:
>>>>>> - locations of features that span the deletion sites are turned into
>>>>>> Splits.
>>>>>> - locations that extend into the deleted region are turned to Fuzzy to
>>>>>> indicate that their true start/end was lost.
>>>>>> - locations contained inside the deleted regions are lost.
>>>>>> - other features are shifted according to the length of the deletion.
>>>>>>
>>>>>> insert
>>>>>> ======
>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>> - locations of features that span the insertion site are split but
>>>>>> position types are not turned to Fuzzy because no part of the original
>>>>>> feature is lost.
>>>>>> - other features are shifted according to the length of the insertion.
>>>>>>
>>>>>> ligate
>>>>>> ======
>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>>>>> turn.
>>>>>>
>>>>>>
>>>>>> One situation I haven't handled yet is a deletion that spans the origin
>>>>>> of a circular molecule but that should be a rare thing to do anyway. The
>>>>>> code currently throws an error if this is attempted.
>>>>>>
>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>
>>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
In reply to this post by Roy Chaudhuri-3
Hi Roy,

I see what you mean and I had the same thought but somehow I liked the
fuzzy locations more because it suggests to me that the feature is not
complete (anymore). But I do take your point that this is not the
intended use of this location type. I can add notes as you suggest but I
guess I should also add a misc_feature "deletion", in your example
between bases 3 and 4, to make it clearer that something has happened to
the feature.

Frank



On 10/01/12 17:27, Roy Chaudhuri wrote:

> I think it's me that didn't explain very well - I was talking about
> overlapping (rather than spanning) a deletion, although I think the
> same principle applies to the spanning example you gave. Here's some
> test code:
>
> #!/usr/bin/perl
> use warnings FATAL=>qw(all);
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::SeqUtils;
> use Bio::SeqFeature::Generic;
> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                    -start=>2,
>                                                    -end=>9));
>
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                    -start=>2,
>                                                    -end=>5));
> my $out=Bio::SeqIO->newFh(-format=>'genbank');
> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
> print $out $trunc;
>
>
> This currently outputs:
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>      CDS             join(2..>3,<4..6)
>      CDS             2..>3
> ORIGIN
>         1 aaaaaaa
> //
>
> However, I was suggesting that the feature table should be something
> like:
> CDS             join(2..3,4..6)
>                 /note="3 bp internal deletion"
> CDS             join(2..3)
>                 /note="2 bp deleted from 3' end"
>
> Fuzzy locations are intended to represent features which have
> boundaries spanning outside of the sequence. For a defined deletion
> that's not the case, the boundaries of the feature aren't unknown,
> they have been specifically altered.
>
> Hope this is clearer.
> Cheers,
> Roy.
>
> On 10/01/2012 16:47, Frank Schwach wrote:
>> Hi Roy,
>>
>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>> of the feature that become fuzzy but the "inner" ones of the split
>> locations:
>>
>>   --------------------           a feature's location
>> ==========xxxx================= sequence
>>
>>
>>   ---------                     sublocation 1
>>            --------             sublocation 2
>> ===============================
>>
>> x= sequence to delete
>> The feature's location has changed from Simple to Split.
>>
>> Sublocation 1:
>> start is still EXACT and has not changed
>> end is now AFTER because this is not a true end of the feature
>>
>> Sublocation 2:
>> start is BEFORE
>> end is EXACT (but shifted)
>>
>> I hope this makes more sense(?)
>>
>> Cheers,
>>
>> Frank
>>
>>
>>
>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>> Hi Frank,
>>>
>>> Looks good to me. One thing I'm not sure about - why do features
>>> overlapping a deletion become fuzzy? That behaviour is in
>>> trunc_with_features because it's intended to represent a taking a
>>> subregion of a larger sequence, but if you're representing an internal
>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>> they have been specifically altered. Maybe you could give absolute
>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>> truncated by however many bases.
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>> Hi Chris,
>>>>
>>>> I have made the changes in a Git fork and made the pull request now.
>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>> HOWTO for the BioPerl wiki.
>>>>
>>>> Frank
>>>>
>>>>
>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>> fork of the code on Github with a pull request (as you already
>>>>> know, being a contributor to the Primer3 modules).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>> sequence
>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>> vector
>>>>>> and insert a fragment into it while preserving all the
>>>>>> annotations and
>>>>>> moving the features accordingly.
>>>>>> My main aim was to split features that span deletion/insertion
>>>>>> sites in
>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>> methods.
>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>> methods:
>>>>>>
>>>>>> delete
>>>>>> ======
>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>> and types
>>>>>> of locations of sequence features:
>>>>>> - locations of features that span the deletion sites are turned into
>>>>>> Splits.
>>>>>> - locations that extend into the deleted region are turned to
>>>>>> Fuzzy to
>>>>>> indicate that their true start/end was lost.
>>>>>> - locations contained inside the deleted regions are lost.
>>>>>> - other features are shifted according to the length of the
>>>>>> deletion.
>>>>>>
>>>>>> insert
>>>>>> ======
>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>> - locations of features that span the insertion site are split but
>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>> original
>>>>>> feature is lost.
>>>>>> - other features are shifted according to the length of the
>>>>>> insertion.
>>>>>>
>>>>>> ligate
>>>>>> ======
>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>> insert in
>>>>>> turn.
>>>>>>
>>>>>>
>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>> origin
>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>> anyway. The
>>>>>> code currently throws an error if this is attempted.
>>>>>>
>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>
>>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>


--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Fields, Christopher J
Have to agree with Roy, in cases where the bounds of the deletion are defined, the truncated features wouldn't be fuzzy (e.g. the start and end would be known, the feature would just be truncated).  Same with other mutations.

chris

On Jan 10, 2012, at 4:35 PM, Frank Schwach wrote:

> Hi Roy,
>
> I see what you mean and I had the same thought but somehow I liked the fuzzy locations more because it suggests to me that the feature is not complete (anymore). But I do take your point that this is not the intended use of this location type. I can add notes as you suggest but I guess I should also add a misc_feature "deletion", in your example between bases 3 and 4, to make it clearer that something has happened to the feature.
>
> Frank
>
>
>
> On 10/01/12 17:27, Roy Chaudhuri wrote:
>> I think it's me that didn't explain very well - I was talking about overlapping (rather than spanning) a deletion, although I think the same principle applies to the spanning example you gave. Here's some test code:
>>
>> #!/usr/bin/perl
>> use warnings FATAL=>qw(all);
>> use strict;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::SeqUtils;
>> use Bio::SeqFeature::Generic;
>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                   -start=>2,
>>                                                   -end=>9));
>>
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                   -start=>2,
>>                                                   -end=>5));
>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>> print $out $trunc;
>>
>>
>> This currently outputs:
>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>> ACCESSION   unknown
>> FEATURES             Location/Qualifiers
>>     CDS             join(2..>3,<4..6)
>>     CDS             2..>3
>> ORIGIN
>>        1 aaaaaaa
>> //
>>
>> However, I was suggesting that the feature table should be something like:
>> CDS             join(2..3,4..6)
>>                /note="3 bp internal deletion"
>> CDS             join(2..3)
>>                /note="2 bp deleted from 3' end"
>>
>> Fuzzy locations are intended to represent features which have boundaries spanning outside of the sequence. For a defined deletion that's not the case, the boundaries of the feature aren't unknown, they have been specifically altered.
>>
>> Hope this is clearer.
>> Cheers,
>> Roy.
>>
>> On 10/01/2012 16:47, Frank Schwach wrote:
>>> Hi Roy,
>>>
>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>> of the feature that become fuzzy but the "inner" ones of the split
>>> locations:
>>>
>>>  --------------------           a feature's location
>>> ==========xxxx================= sequence
>>>
>>>
>>>  ---------                     sublocation 1
>>>           --------             sublocation 2
>>> ===============================
>>>
>>> x= sequence to delete
>>> The feature's location has changed from Simple to Split.
>>>
>>> Sublocation 1:
>>> start is still EXACT and has not changed
>>> end is now AFTER because this is not a true end of the feature
>>>
>>> Sublocation 2:
>>> start is BEFORE
>>> end is EXACT (but shifted)
>>>
>>> I hope this makes more sense(?)
>>>
>>> Cheers,
>>>
>>> Frank
>>>
>>>
>>>
>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>> Hi Frank,
>>>>
>>>> Looks good to me. One thing I'm not sure about - why do features
>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>> trunc_with_features because it's intended to represent a taking a
>>>> subregion of a larger sequence, but if you're representing an internal
>>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>>> they have been specifically altered. Maybe you could give absolute
>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>> truncated by however many bases.
>>>>
>>>> Cheers,
>>>> Roy.
>>>>
>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>> Hi Chris,
>>>>>
>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>> HOWTO for the BioPerl wiki.
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>>>>>> and insert a fragment into it while preserving all the annotations and
>>>>>>> moving the features accordingly.
>>>>>>> My main aim was to split features that span deletion/insertion sites in
>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>> methods.
>>>>>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>>>>>
>>>>>>> delete
>>>>>>> ======
>>>>>>> removes a segment from a sequence object and adjusts positions and types
>>>>>>> of locations of sequence features:
>>>>>>> - locations of features that span the deletion sites are turned into
>>>>>>> Splits.
>>>>>>> - locations that extend into the deleted region are turned to Fuzzy to
>>>>>>> indicate that their true start/end was lost.
>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>> - other features are shifted according to the length of the deletion.
>>>>>>>
>>>>>>> insert
>>>>>>> ======
>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>> - locations of features that span the insertion site are split but
>>>>>>> position types are not turned to Fuzzy because no part of the original
>>>>>>> feature is lost.
>>>>>>> - other features are shifted according to the length of the insertion.
>>>>>>>
>>>>>>> ligate
>>>>>>> ======
>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>>>>>> turn.
>>>>>>>
>>>>>>>
>>>>>>> One situation I haven't handled yet is a deletion that spans the origin
>>>>>>> of a circular molecule but that should be a rare thing to do anyway. The
>>>>>>> code currently throws an error if this is attempted.
>>>>>>>
>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>
>>>>>>> Frank
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Chris Fields
Senior Research Scientist
National Center for Supercomputing Applications
Institute for Genomic Biology
University of Illinois at Urbana-Champaign


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
In reply to this post by Roy Chaudhuri-3
Hi Roy and Chris,

I have made the changes to the code now. As you suggested, feature ends
no longer change type and I insert a note instead to inform about the
deletion (or insertion), showing the length and position.
I have also added a feature to annotate deletion sites themselves (with
IN-BETWEEN locations).

Roy's test script now prints:

LOCUS       seq-accession_number            7 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
      CDS             join(2..3,4..6)
                      /note="3bp internal deletion between pos 3 and 4"
      CDS             2..3
                      /note="2bp deleted from feature end"
      misc_feature    3^4
                      /note="deletion of 3bp"
ORIGIN
         1 aaaaaaa
//


or, if you add strand information (-1 in this case) to the second feature:

LOCUS       seq-accession_number            7 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
      CDS             join(2..3,4..6)
                      /note="3bp internal deletion between pos 3 and 4"
      CDS             complement(2..3)
                      /note="2bp deleted from feature 5' end"
      misc_feature    3^4
                      /note="deletion of 3bp"
ORIGIN
         1 aaaaaaa
//

I have comitted this along with some bugfixes to my master branch on GitHub
https://github.com/fschwach/bioperl-live
so it's now also in my existing pull request.

I'm still wondering if cloning the sequence objects rather than calling
'new' on their respective classes would be an option inside 'delete' and
'insert'?
I'm experimenting with this for my own purposes because I have to work
with custom sub-classes of Bio::Seq which have additional attributes and
therefore set 'can_call_new' to false.
Without cloning the objects, I first have to convert the custom
Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
Is there any reason why something like Clone::Fast should not be used in
this case? It seems to work for me but there may be situations where
this is going to blow up which I am not aware of.
Cloning rather than calling new could be made an option in
Bio::SeqUtils. I have most of the code for that already.

Frank











On 10/01/12 17:31, Roy Chaudhuri wrote:

> Or without the typo:
>
> CDS             join(2..3,4..6)
>                 /note="3 bp internal deletion"
> CDS             2..3
>                 /note="2 bp deleted from 3' end"
>
> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>> I think it's me that didn't explain very well - I was talking about
>> overlapping (rather than spanning) a deletion, although I think the same
>> principle applies to the spanning example you gave. Here's some test
>> code:
>>
>> #!/usr/bin/perl
>> use warnings FATAL=>qw(all);
>> use strict;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::SeqUtils;
>> use Bio::SeqFeature::Generic;
>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                      -start=>2,
>>                                                      -end=>9));
>>
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                      -start=>2,
>>                                                      -end=>5));
>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>> print $out $trunc;
>>
>>
>> This currently outputs:
>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>> ACCESSION   unknown
>> FEATURES             Location/Qualifiers
>>        CDS             join(2..>3,<4..6)
>>        CDS             2..>3
>> ORIGIN
>>           1 aaaaaaa
>> //
>>
>> However, I was suggesting that the feature table should be something
>> like:
>> CDS             join(2..3,4..6)
>>                   /note="3 bp internal deletion"
>> CDS             join(2..3)
>>                   /note="2 bp deleted from 3' end"
>>
>> Fuzzy locations are intended to represent features which have boundaries
>> spanning outside of the sequence. For a defined deletion that's not the
>> case, the boundaries of the feature aren't unknown, they have been
>> specifically altered.
>>
>> Hope this is clearer.
>> Cheers,
>> Roy.
>>
>> On 10/01/2012 16:47, Frank Schwach wrote:
>>> Hi Roy,
>>>
>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>> of the feature that become fuzzy but the "inner" ones of the split
>>> locations:
>>>
>>>    --------------------           a feature's location
>>> ==========xxxx================= sequence
>>>
>>>
>>>    ---------                     sublocation 1
>>>             --------             sublocation 2
>>> ===============================
>>>
>>> x= sequence to delete
>>> The feature's location has changed from Simple to Split.
>>>
>>> Sublocation 1:
>>> start is still EXACT and has not changed
>>> end is now AFTER because this is not a true end of the feature
>>>
>>> Sublocation 2:
>>> start is BEFORE
>>> end is EXACT (but shifted)
>>>
>>> I hope this makes more sense(?)
>>>
>>> Cheers,
>>>
>>> Frank
>>>
>>>
>>>
>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>> Hi Frank,
>>>>
>>>> Looks good to me. One thing I'm not sure about - why do features
>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>> trunc_with_features because it's intended to represent a taking a
>>>> subregion of a larger sequence, but if you're representing an internal
>>>> deletion then the boundaries of the overlapping feature aren't
>>>> unknown,
>>>> they have been specifically altered. Maybe you could give absolute
>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>> truncated by however many bases.
>>>>
>>>> Cheers,
>>>> Roy.
>>>>
>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>> Hi Chris,
>>>>>
>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>> HOWTO for the BioPerl wiki.
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>> fork of the code on Github with a pull request (as you already
>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>> sequence
>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>> vector
>>>>>>> and insert a fragment into it while preserving all the
>>>>>>> annotations and
>>>>>>> moving the features accordingly.
>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>> sites in
>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>> methods.
>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>> methods:
>>>>>>>
>>>>>>> delete
>>>>>>> ======
>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>> and types
>>>>>>> of locations of sequence features:
>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>> into
>>>>>>> Splits.
>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>> Fuzzy to
>>>>>>> indicate that their true start/end was lost.
>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>> - other features are shifted according to the length of the
>>>>>>> deletion.
>>>>>>>
>>>>>>> insert
>>>>>>> ======
>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>> - locations of features that span the insertion site are split but
>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>> original
>>>>>>> feature is lost.
>>>>>>> - other features are shifted according to the length of the
>>>>>>> insertion.
>>>>>>>
>>>>>>> ligate
>>>>>>> ======
>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>> insert in
>>>>>>> turn.
>>>>>>>
>>>>>>>
>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>> origin
>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>> anyway. The
>>>>>>> code currently throws an error if this is attempted.
>>>>>>>
>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>
>>>>>>> Frank
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>


--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri-3
Hi Frank,

Looks great, I like the use of between locations, didn't think of that.

It was suggested that I avoid using Clone for cat, trunc_with_features
etc. to avoid adding a dependency (which may no longer be an issue) and
because it would cause problems for Bio::Seq implementations that use a
database as the back-end. Maybe you could add it as an option, but keep
the default as is?

Cheers,
Roy.

On 11/01/2012 18:16, Frank Schwach wrote:

> Hi Roy and Chris,
>
> I have made the changes to the code now. As you suggested, feature ends
> no longer change type and I insert a note instead to inform about the
> deletion (or insertion), showing the length and position.
> I have also added a feature to annotate deletion sites themselves (with
> IN-BETWEEN locations).
>
> Roy's test script now prints:
>
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>        CDS             join(2..3,4..6)
>                        /note="3bp internal deletion between pos 3 and 4"
>        CDS             2..3
>                        /note="2bp deleted from feature end"
>        misc_feature    3^4
>                        /note="deletion of 3bp"
> ORIGIN
>           1 aaaaaaa
> //
>
>
> or, if you add strand information (-1 in this case) to the second feature:
>
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>        CDS             join(2..3,4..6)
>                        /note="3bp internal deletion between pos 3 and 4"
>        CDS             complement(2..3)
>                        /note="2bp deleted from feature 5' end"
>        misc_feature    3^4
>                        /note="deletion of 3bp"
> ORIGIN
>           1 aaaaaaa
> //
>
> I have comitted this along with some bugfixes to my master branch on GitHub
> https://github.com/fschwach/bioperl-live
> so it's now also in my existing pull request.
>
> I'm still wondering if cloning the sequence objects rather than calling
> 'new' on their respective classes would be an option inside 'delete' and
> 'insert'?
> I'm experimenting with this for my own purposes because I have to work
> with custom sub-classes of Bio::Seq which have additional attributes and
> therefore set 'can_call_new' to false.
> Without cloning the objects, I first have to convert the custom
> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
> Is there any reason why something like Clone::Fast should not be used in
> this case? It seems to work for me but there may be situations where
> this is going to blow up which I am not aware of.
> Cloning rather than calling new could be made an option in
> Bio::SeqUtils. I have most of the code for that already.
>
> Frank
>
>
>
>
>
>
>
>
>
>
>
> On 10/01/12 17:31, Roy Chaudhuri wrote:
>> Or without the typo:
>>
>> CDS             join(2..3,4..6)
>>                  /note="3 bp internal deletion"
>> CDS             2..3
>>                  /note="2 bp deleted from 3' end"
>>
>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>> I think it's me that didn't explain very well - I was talking about
>>> overlapping (rather than spanning) a deletion, although I think the same
>>> principle applies to the spanning example you gave. Here's some test
>>> code:
>>>
>>> #!/usr/bin/perl
>>> use warnings FATAL=>qw(all);
>>> use strict;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::SeqUtils;
>>> use Bio::SeqFeature::Generic;
>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>                                                       -start=>2,
>>>                                                       -end=>9));
>>>
>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>                                                       -start=>2,
>>>                                                       -end=>5));
>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>> print $out $trunc;
>>>
>>>
>>> This currently outputs:
>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>> ACCESSION   unknown
>>> FEATURES             Location/Qualifiers
>>>         CDS             join(2..>3,<4..6)
>>>         CDS             2..>3
>>> ORIGIN
>>>            1 aaaaaaa
>>> //
>>>
>>> However, I was suggesting that the feature table should be something
>>> like:
>>> CDS             join(2..3,4..6)
>>>                    /note="3 bp internal deletion"
>>> CDS             join(2..3)
>>>                    /note="2 bp deleted from 3' end"
>>>
>>> Fuzzy locations are intended to represent features which have boundaries
>>> spanning outside of the sequence. For a defined deletion that's not the
>>> case, the boundaries of the feature aren't unknown, they have been
>>> specifically altered.
>>>
>>> Hope this is clearer.
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>> Hi Roy,
>>>>
>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>> locations:
>>>>
>>>>     --------------------           a feature's location
>>>> ==========xxxx================= sequence
>>>>
>>>>
>>>>     ---------                     sublocation 1
>>>>              --------             sublocation 2
>>>> ===============================
>>>>
>>>> x= sequence to delete
>>>> The feature's location has changed from Simple to Split.
>>>>
>>>> Sublocation 1:
>>>> start is still EXACT and has not changed
>>>> end is now AFTER because this is not a true end of the feature
>>>>
>>>> Sublocation 2:
>>>> start is BEFORE
>>>> end is EXACT (but shifted)
>>>>
>>>> I hope this makes more sense(?)
>>>>
>>>> Cheers,
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>> Hi Frank,
>>>>>
>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>> trunc_with_features because it's intended to represent a taking a
>>>>> subregion of a larger sequence, but if you're representing an internal
>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>> unknown,
>>>>> they have been specifically altered. Maybe you could give absolute
>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>>> truncated by however many bases.
>>>>>
>>>>> Cheers,
>>>>> Roy.
>>>>>
>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>>> HOWTO for the BioPerl wiki.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>
>>>>>>> chris
>>>>>>>
>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>> sequence
>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>>> vector
>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>> annotations and
>>>>>>>> moving the features accordingly.
>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>> sites in
>>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>>> methods.
>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>> methods:
>>>>>>>>
>>>>>>>> delete
>>>>>>>> ======
>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>> and types
>>>>>>>> of locations of sequence features:
>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>> into
>>>>>>>> Splits.
>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>> Fuzzy to
>>>>>>>> indicate that their true start/end was lost.
>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>> - other features are shifted according to the length of the
>>>>>>>> deletion.
>>>>>>>>
>>>>>>>> insert
>>>>>>>> ======
>>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>>> - locations of features that span the insertion site are split but
>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>> original
>>>>>>>> feature is lost.
>>>>>>>> - other features are shifted according to the length of the
>>>>>>>> insertion.
>>>>>>>>
>>>>>>>> ligate
>>>>>>>> ======
>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>> insert in
>>>>>>>> turn.
>>>>>>>>
>>>>>>>>
>>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>>> origin
>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>> anyway. The
>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>
>>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>
>>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Fields, Christopher J
Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable.  It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release).

chris

On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:

> Hi Frank,
>
> Looks great, I like the use of between locations, didn't think of that.
>
> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is?
>
> Cheers,
> Roy.
>
> On 11/01/2012 18:16, Frank Schwach wrote:
>> Hi Roy and Chris,
>>
>> I have made the changes to the code now. As you suggested, feature ends
>> no longer change type and I insert a note instead to inform about the
>> deletion (or insertion), showing the length and position.
>> I have also added a feature to annotate deletion sites themselves (with
>> IN-BETWEEN locations).
>>
>> Roy's test script now prints:
>>
>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>> ACCESSION   unknown
>> FEATURES             Location/Qualifiers
>>       CDS             join(2..3,4..6)
>>                       /note="3bp internal deletion between pos 3 and 4"
>>       CDS             2..3
>>                       /note="2bp deleted from feature end"
>>       misc_feature    3^4
>>                       /note="deletion of 3bp"
>> ORIGIN
>>          1 aaaaaaa
>> //
>>
>>
>> or, if you add strand information (-1 in this case) to the second feature:
>>
>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>> ACCESSION   unknown
>> FEATURES             Location/Qualifiers
>>       CDS             join(2..3,4..6)
>>                       /note="3bp internal deletion between pos 3 and 4"
>>       CDS             complement(2..3)
>>                       /note="2bp deleted from feature 5' end"
>>       misc_feature    3^4
>>                       /note="deletion of 3bp"
>> ORIGIN
>>          1 aaaaaaa
>> //
>>
>> I have comitted this along with some bugfixes to my master branch on GitHub
>> https://github.com/fschwach/bioperl-live
>> so it's now also in my existing pull request.
>>
>> I'm still wondering if cloning the sequence objects rather than calling
>> 'new' on their respective classes would be an option inside 'delete' and
>> 'insert'?
>> I'm experimenting with this for my own purposes because I have to work
>> with custom sub-classes of Bio::Seq which have additional attributes and
>> therefore set 'can_call_new' to false.
>> Without cloning the objects, I first have to convert the custom
>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
>> Is there any reason why something like Clone::Fast should not be used in
>> this case? It seems to work for me but there may be situations where
>> this is going to blow up which I am not aware of.
>> Cloning rather than calling new could be made an option in
>> Bio::SeqUtils. I have most of the code for that already.
>>
>> Frank
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>> Or without the typo:
>>>
>>> CDS             join(2..3,4..6)
>>>                 /note="3 bp internal deletion"
>>> CDS             2..3
>>>                 /note="2 bp deleted from 3' end"
>>>
>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>> I think it's me that didn't explain very well - I was talking about
>>>> overlapping (rather than spanning) a deletion, although I think the same
>>>> principle applies to the spanning example you gave. Here's some test
>>>> code:
>>>>
>>>> #!/usr/bin/perl
>>>> use warnings FATAL=>qw(all);
>>>> use strict;
>>>> use Bio::Seq;
>>>> use Bio::SeqIO;
>>>> use Bio::SeqUtils;
>>>> use Bio::SeqFeature::Generic;
>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>                                                      -start=>2,
>>>>                                                      -end=>9));
>>>>
>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>                                                      -start=>2,
>>>>                                                      -end=>5));
>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>> print $out $trunc;
>>>>
>>>>
>>>> This currently outputs:
>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>>> ACCESSION   unknown
>>>> FEATURES             Location/Qualifiers
>>>>        CDS             join(2..>3,<4..6)
>>>>        CDS             2..>3
>>>> ORIGIN
>>>>           1 aaaaaaa
>>>> //
>>>>
>>>> However, I was suggesting that the feature table should be something
>>>> like:
>>>> CDS             join(2..3,4..6)
>>>>                   /note="3 bp internal deletion"
>>>> CDS             join(2..3)
>>>>                   /note="2 bp deleted from 3' end"
>>>>
>>>> Fuzzy locations are intended to represent features which have boundaries
>>>> spanning outside of the sequence. For a defined deletion that's not the
>>>> case, the boundaries of the feature aren't unknown, they have been
>>>> specifically altered.
>>>>
>>>> Hope this is clearer.
>>>> Cheers,
>>>> Roy.
>>>>
>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>> Hi Roy,
>>>>>
>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>>> locations:
>>>>>
>>>>>    --------------------           a feature's location
>>>>> ==========xxxx================= sequence
>>>>>
>>>>>
>>>>>    ---------                     sublocation 1
>>>>>             --------             sublocation 2
>>>>> ===============================
>>>>>
>>>>> x= sequence to delete
>>>>> The feature's location has changed from Simple to Split.
>>>>>
>>>>> Sublocation 1:
>>>>> start is still EXACT and has not changed
>>>>> end is now AFTER because this is not a true end of the feature
>>>>>
>>>>> Sublocation 2:
>>>>> start is BEFORE
>>>>> end is EXACT (but shifted)
>>>>>
>>>>> I hope this makes more sense(?)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>>> Hi Frank,
>>>>>>
>>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>>> trunc_with_features because it's intended to represent a taking a
>>>>>> subregion of a larger sequence, but if you're representing an internal
>>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>>> unknown,
>>>>>> they have been specifically altered. Maybe you could give absolute
>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>>>> truncated by however many bases.
>>>>>>
>>>>>> Cheers,
>>>>>> Roy.
>>>>>>
>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>>>> HOWTO for the BioPerl wiki.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>>
>>>>>>>> chris
>>>>>>>>
>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>>> sequence
>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>>>> vector
>>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>>> annotations and
>>>>>>>>> moving the features accordingly.
>>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>>> sites in
>>>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>>>> methods.
>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>>> methods:
>>>>>>>>>
>>>>>>>>> delete
>>>>>>>>> ======
>>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>>> and types
>>>>>>>>> of locations of sequence features:
>>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>>> into
>>>>>>>>> Splits.
>>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>>> Fuzzy to
>>>>>>>>> indicate that their true start/end was lost.
>>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>> deletion.
>>>>>>>>>
>>>>>>>>> insert
>>>>>>>>> ======
>>>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>>>> - locations of features that span the insertion site are split but
>>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>>> original
>>>>>>>>> feature is lost.
>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>> insertion.
>>>>>>>>>
>>>>>>>>> ligate
>>>>>>>>> ======
>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>>> insert in
>>>>>>>>> turn.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>>>> origin
>>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>>> anyway. The
>>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>>
>>>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
Great, I'll work on a branch that gives the user the option to use clone
instead of new and then we can see if we want to use that in the end. In
the meantime, what do you think about pulling this into bioperl-live?
When I have some time again I can work on the HOWTO for these new
features for the BioPerl wiki

Frank


On 11/01/12 18:42, Fields, Christopher J wrote:

> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable.  It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release).
>
> chris
>
> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
>
>> Hi Frank,
>>
>> Looks great, I like the use of between locations, didn't think of that.
>>
>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is?
>>
>> Cheers,
>> Roy.
>>
>> On 11/01/2012 18:16, Frank Schwach wrote:
>>> Hi Roy and Chris,
>>>
>>> I have made the changes to the code now. As you suggested, feature ends
>>> no longer change type and I insert a note instead to inform about the
>>> deletion (or insertion), showing the length and position.
>>> I have also added a feature to annotate deletion sites themselves (with
>>> IN-BETWEEN locations).
>>>
>>> Roy's test script now prints:
>>>
>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>> ACCESSION   unknown
>>> FEATURES             Location/Qualifiers
>>>        CDS             join(2..3,4..6)
>>>                        /note="3bp internal deletion between pos 3 and 4"
>>>        CDS             2..3
>>>                        /note="2bp deleted from feature end"
>>>        misc_feature    3^4
>>>                        /note="deletion of 3bp"
>>> ORIGIN
>>>           1 aaaaaaa
>>> //
>>>
>>>
>>> or, if you add strand information (-1 in this case) to the second feature:
>>>
>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>> ACCESSION   unknown
>>> FEATURES             Location/Qualifiers
>>>        CDS             join(2..3,4..6)
>>>                        /note="3bp internal deletion between pos 3 and 4"
>>>        CDS             complement(2..3)
>>>                        /note="2bp deleted from feature 5' end"
>>>        misc_feature    3^4
>>>                        /note="deletion of 3bp"
>>> ORIGIN
>>>           1 aaaaaaa
>>> //
>>>
>>> I have comitted this along with some bugfixes to my master branch on GitHub
>>> https://github.com/fschwach/bioperl-live
>>> so it's now also in my existing pull request.
>>>
>>> I'm still wondering if cloning the sequence objects rather than calling
>>> 'new' on their respective classes would be an option inside 'delete' and
>>> 'insert'?
>>> I'm experimenting with this for my own purposes because I have to work
>>> with custom sub-classes of Bio::Seq which have additional attributes and
>>> therefore set 'can_call_new' to false.
>>> Without cloning the objects, I first have to convert the custom
>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
>>> Is there any reason why something like Clone::Fast should not be used in
>>> this case? It seems to work for me but there may be situations where
>>> this is going to blow up which I am not aware of.
>>> Cloning rather than calling new could be made an option in
>>> Bio::SeqUtils. I have most of the code for that already.
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>>> Or without the typo:
>>>>
>>>> CDS             join(2..3,4..6)
>>>>                  /note="3 bp internal deletion"
>>>> CDS             2..3
>>>>                  /note="2 bp deleted from 3' end"
>>>>
>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>>> I think it's me that didn't explain very well - I was talking about
>>>>> overlapping (rather than spanning) a deletion, although I think the same
>>>>> principle applies to the spanning example you gave. Here's some test
>>>>> code:
>>>>>
>>>>> #!/usr/bin/perl
>>>>> use warnings FATAL=>qw(all);
>>>>> use strict;
>>>>> use Bio::Seq;
>>>>> use Bio::SeqIO;
>>>>> use Bio::SeqUtils;
>>>>> use Bio::SeqFeature::Generic;
>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>                                                       -start=>2,
>>>>>                                                       -end=>9));
>>>>>
>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>                                                       -start=>2,
>>>>>                                                       -end=>5));
>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>>> print $out $trunc;
>>>>>
>>>>>
>>>>> This currently outputs:
>>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>>>> ACCESSION   unknown
>>>>> FEATURES             Location/Qualifiers
>>>>>         CDS             join(2..>3,<4..6)
>>>>>         CDS             2..>3
>>>>> ORIGIN
>>>>>            1 aaaaaaa
>>>>> //
>>>>>
>>>>> However, I was suggesting that the feature table should be something
>>>>> like:
>>>>> CDS             join(2..3,4..6)
>>>>>                    /note="3 bp internal deletion"
>>>>> CDS             join(2..3)
>>>>>                    /note="2 bp deleted from 3' end"
>>>>>
>>>>> Fuzzy locations are intended to represent features which have boundaries
>>>>> spanning outside of the sequence. For a defined deletion that's not the
>>>>> case, the boundaries of the feature aren't unknown, they have been
>>>>> specifically altered.
>>>>>
>>>>> Hope this is clearer.
>>>>> Cheers,
>>>>> Roy.
>>>>>
>>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>>> Hi Roy,
>>>>>>
>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>>>> locations:
>>>>>>
>>>>>>     --------------------           a feature's location
>>>>>> ==========xxxx================= sequence
>>>>>>
>>>>>>
>>>>>>     ---------                     sublocation 1
>>>>>>              --------             sublocation 2
>>>>>> ===============================
>>>>>>
>>>>>> x= sequence to delete
>>>>>> The feature's location has changed from Simple to Split.
>>>>>>
>>>>>> Sublocation 1:
>>>>>> start is still EXACT and has not changed
>>>>>> end is now AFTER because this is not a true end of the feature
>>>>>>
>>>>>> Sublocation 2:
>>>>>> start is BEFORE
>>>>>> end is EXACT (but shifted)
>>>>>>
>>>>>> I hope this makes more sense(?)
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>>>> Hi Frank,
>>>>>>>
>>>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>>>> trunc_with_features because it's intended to represent a taking a
>>>>>>> subregion of a larger sequence, but if you're representing an internal
>>>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>>>> unknown,
>>>>>>> they have been specifically altered. Maybe you could give absolute
>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>>>>> truncated by however many bases.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Roy.
>>>>>>>
>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>>>>> HOWTO for the BioPerl wiki.
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>>>
>>>>>>>>> chris
>>>>>>>>>
>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>>>> sequence
>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>>>>> vector
>>>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>>>> annotations and
>>>>>>>>>> moving the features accordingly.
>>>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>>>> sites in
>>>>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>>>>> methods.
>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>>>> methods:
>>>>>>>>>>
>>>>>>>>>> delete
>>>>>>>>>> ======
>>>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>>>> and types
>>>>>>>>>> of locations of sequence features:
>>>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>>>> into
>>>>>>>>>> Splits.
>>>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>>>> Fuzzy to
>>>>>>>>>> indicate that their true start/end was lost.
>>>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>> deletion.
>>>>>>>>>>
>>>>>>>>>> insert
>>>>>>>>>> ======
>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>>>>> - locations of features that span the insertion site are split but
>>>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>>>> original
>>>>>>>>>> feature is lost.
>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>> insertion.
>>>>>>>>>>
>>>>>>>>>> ligate
>>>>>>>>>> ======
>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>>>> insert in
>>>>>>>>>> turn.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>>>>> origin
>>>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>>>> anyway. The
>>>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>>>
>>>>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>>>
>>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>


--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
I have now created a version that gives the option to create the
products of 'delete' and 'insert' via Bio::Root::Root:clone instead of
calling 'new' on the input seq object class. Seems to be working fine
for me so far.

'delete' and 'insert' can now take a hashref of options.
The only option so far is to set 'clone_obj to true, to use cloning
instead of creating objects via 'new'.
Setting this parameter to false or not supplying the options hashref at
all will give you the old behaviour (call 'new').
Example:

my $product = Bio::SeqUtils->delete(
  $seq_obj,
  11,
  20,
  { clone_obj => 1}
);

The ligate method takes clone_obj as a named parameter:

  my $new_molecule = Bio::Sequtils::Pbrtools->ligate(
             -recipient => $vector,
             -fragment => $fragment,
             -left => 1000,
             -right => 1100,
             -flip      => 1,
             -clone_obj => 1
           );

This is in a branch of my GitHub repo if you would like to have a look:

https://github.com/fschwach/bioperl-live/tree/sequtils_clone

Unfortunately, I can't add this option to trunc_with_features because
the creation of the new object is delegated to 'trunc'. I guess I could
implement 'trunc' in Bio::SeqUtils itself(?)

What do you think, could this be merged into bioperl-live?


Frank



On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote:

> Great, I'll work on a branch that gives the user the option to use clone
> instead of new and then we can see if we want to use that in the end. In
> the meantime, what do you think about pulling this into bioperl-live?
> When I have some time again I can work on the HOWTO for these new
> features for the BioPerl wiki
>
> Frank
>
>
> On 11/01/12 18:42, Fields, Christopher J wrote:
> > Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable.  It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release).
> >
> > chris
> >
> > On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
> >
> >> Hi Frank,
> >>
> >> Looks great, I like the use of between locations, didn't think of that.
> >>
> >> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is?
> >>
> >> Cheers,
> >> Roy.
> >>
> >> On 11/01/2012 18:16, Frank Schwach wrote:
> >>> Hi Roy and Chris,
> >>>
> >>> I have made the changes to the code now. As you suggested, feature ends
> >>> no longer change type and I insert a note instead to inform about the
> >>> deletion (or insertion), showing the length and position.
> >>> I have also added a feature to annotate deletion sites themselves (with
> >>> IN-BETWEEN locations).
> >>>
> >>> Roy's test script now prints:
> >>>
> >>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> >>> ACCESSION   unknown
> >>> FEATURES             Location/Qualifiers
> >>>        CDS             join(2..3,4..6)
> >>>                        /note="3bp internal deletion between pos 3 and 4"
> >>>        CDS             2..3
> >>>                        /note="2bp deleted from feature end"
> >>>        misc_feature    3^4
> >>>                        /note="deletion of 3bp"
> >>> ORIGIN
> >>>           1 aaaaaaa
> >>> //
> >>>
> >>>
> >>> or, if you add strand information (-1 in this case) to the second feature:
> >>>
> >>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> >>> ACCESSION   unknown
> >>> FEATURES             Location/Qualifiers
> >>>        CDS             join(2..3,4..6)
> >>>                        /note="3bp internal deletion between pos 3 and 4"
> >>>        CDS             complement(2..3)
> >>>                        /note="2bp deleted from feature 5' end"
> >>>        misc_feature    3^4
> >>>                        /note="deletion of 3bp"
> >>> ORIGIN
> >>>           1 aaaaaaa
> >>> //
> >>>
> >>> I have comitted this along with some bugfixes to my master branch on GitHub
> >>> https://github.com/fschwach/bioperl-live
> >>> so it's now also in my existing pull request.
> >>>
> >>> I'm still wondering if cloning the sequence objects rather than calling
> >>> 'new' on their respective classes would be an option inside 'delete' and
> >>> 'insert'?
> >>> I'm experimenting with this for my own purposes because I have to work
> >>> with custom sub-classes of Bio::Seq which have additional attributes and
> >>> therefore set 'can_call_new' to false.
> >>> Without cloning the objects, I first have to convert the custom
> >>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
> >>> Is there any reason why something like Clone::Fast should not be used in
> >>> this case? It seems to work for me but there may be situations where
> >>> this is going to blow up which I am not aware of.
> >>> Cloning rather than calling new could be made an option in
> >>> Bio::SeqUtils. I have most of the code for that already.
> >>>
> >>> Frank
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 10/01/12 17:31, Roy Chaudhuri wrote:
> >>>> Or without the typo:
> >>>>
> >>>> CDS             join(2..3,4..6)
> >>>>                  /note="3 bp internal deletion"
> >>>> CDS             2..3
> >>>>                  /note="2 bp deleted from 3' end"
> >>>>
> >>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
> >>>>> I think it's me that didn't explain very well - I was talking about
> >>>>> overlapping (rather than spanning) a deletion, although I think the same
> >>>>> principle applies to the spanning example you gave. Here's some test
> >>>>> code:
> >>>>>
> >>>>> #!/usr/bin/perl
> >>>>> use warnings FATAL=>qw(all);
> >>>>> use strict;
> >>>>> use Bio::Seq;
> >>>>> use Bio::SeqIO;
> >>>>> use Bio::SeqUtils;
> >>>>> use Bio::SeqFeature::Generic;
> >>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
> >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
> >>>>>                                                       -start=>2,
> >>>>>                                                       -end=>9));
> >>>>>
> >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
> >>>>>                                                       -start=>2,
> >>>>>                                                       -end=>5));
> >>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
> >>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
> >>>>> print $out $trunc;
> >>>>>
> >>>>>
> >>>>> This currently outputs:
> >>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> >>>>> ACCESSION   unknown
> >>>>> FEATURES             Location/Qualifiers
> >>>>>         CDS             join(2..>3,<4..6)
> >>>>>         CDS             2..>3
> >>>>> ORIGIN
> >>>>>            1 aaaaaaa
> >>>>> //
> >>>>>
> >>>>> However, I was suggesting that the feature table should be something
> >>>>> like:
> >>>>> CDS             join(2..3,4..6)
> >>>>>                    /note="3 bp internal deletion"
> >>>>> CDS             join(2..3)
> >>>>>                    /note="2 bp deleted from 3' end"
> >>>>>
> >>>>> Fuzzy locations are intended to represent features which have boundaries
> >>>>> spanning outside of the sequence. For a defined deletion that's not the
> >>>>> case, the boundaries of the feature aren't unknown, they have been
> >>>>> specifically altered.
> >>>>>
> >>>>> Hope this is clearer.
> >>>>> Cheers,
> >>>>> Roy.
> >>>>>
> >>>>> On 10/01/2012 16:47, Frank Schwach wrote:
> >>>>>> Hi Roy,
> >>>>>>
> >>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
> >>>>>> of the feature that become fuzzy but the "inner" ones of the split
> >>>>>> locations:
> >>>>>>
> >>>>>>     --------------------           a feature's location
> >>>>>> ==========xxxx================= sequence
> >>>>>>
> >>>>>>
> >>>>>>     ---------                     sublocation 1
> >>>>>>              --------             sublocation 2
> >>>>>> ===============================
> >>>>>>
> >>>>>> x= sequence to delete
> >>>>>> The feature's location has changed from Simple to Split.
> >>>>>>
> >>>>>> Sublocation 1:
> >>>>>> start is still EXACT and has not changed
> >>>>>> end is now AFTER because this is not a true end of the feature
> >>>>>>
> >>>>>> Sublocation 2:
> >>>>>> start is BEFORE
> >>>>>> end is EXACT (but shifted)
> >>>>>>
> >>>>>> I hope this makes more sense(?)
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Frank
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
> >>>>>>> Hi Frank,
> >>>>>>>
> >>>>>>> Looks good to me. One thing I'm not sure about - why do features
> >>>>>>> overlapping a deletion become fuzzy? That behaviour is in
> >>>>>>> trunc_with_features because it's intended to represent a taking a
> >>>>>>> subregion of a larger sequence, but if you're representing an internal
> >>>>>>> deletion then the boundaries of the overlapping feature aren't
> >>>>>>> unknown,
> >>>>>>> they have been specifically altered. Maybe you could give absolute
> >>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
> >>>>>>> truncated by however many bases.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Roy.
> >>>>>>>
> >>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
> >>>>>>>> Hi Chris,
> >>>>>>>>
> >>>>>>>> I have made the changes in a Git fork and made the pull request now.
> >>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
> >>>>>>>> HOWTO for the BioPerl wiki.
> >>>>>>>>
> >>>>>>>> Frank
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
> >>>>>>>>> Sounds very promising!  The easiest way to contribute is via a
> >>>>>>>>> fork of the code on Github with a pull request (as you already
> >>>>>>>>> know, being a contributor to the Primer3 modules).
> >>>>>>>>>
> >>>>>>>>> chris
> >>>>>>>>>
> >>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
> >>>>>>>>>> sequence
> >>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
> >>>>>>>>>> vector
> >>>>>>>>>> and insert a fragment into it while preserving all the
> >>>>>>>>>> annotations and
> >>>>>>>>>> moving the features accordingly.
> >>>>>>>>>> My main aim was to split features that span deletion/insertion
> >>>>>>>>>> sites in
> >>>>>>>>>> a meaningful way, which can not be done with the currently availble
> >>>>>>>>>> methods.
> >>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
> >>>>>>>>>> methods:
> >>>>>>>>>>
> >>>>>>>>>> delete
> >>>>>>>>>> ======
> >>>>>>>>>> removes a segment from a sequence object and adjusts positions
> >>>>>>>>>> and types
> >>>>>>>>>> of locations of sequence features:
> >>>>>>>>>> - locations of features that span the deletion sites are turned
> >>>>>>>>>> into
> >>>>>>>>>> Splits.
> >>>>>>>>>> - locations that extend into the deleted region are turned to
> >>>>>>>>>> Fuzzy to
> >>>>>>>>>> indicate that their true start/end was lost.
> >>>>>>>>>> - locations contained inside the deleted regions are lost.
> >>>>>>>>>> - other features are shifted according to the length of the
> >>>>>>>>>> deletion.
> >>>>>>>>>>
> >>>>>>>>>> insert
> >>>>>>>>>> ======
> >>>>>>>>>> adds a Bio::Seq object into another one between specified insertion
> >>>>>>>>>> sites. This also affects the features on the recipient sequence:
> >>>>>>>>>> - locations of features that span the insertion site are split but
> >>>>>>>>>> position types are not turned to Fuzzy because no part of the
> >>>>>>>>>> original
> >>>>>>>>>> feature is lost.
> >>>>>>>>>> - other features are shifted according to the length of the
> >>>>>>>>>> insertion.
> >>>>>>>>>>
> >>>>>>>>>> ligate
> >>>>>>>>>> ======
> >>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
> >>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
> >>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
> >>>>>>>>>> insert in
> >>>>>>>>>> turn.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> One situation I haven't handled yet is a deletion that spans the
> >>>>>>>>>> origin
> >>>>>>>>>> of a circular molecule but that should be a rare thing to do
> >>>>>>>>>> anyway. The
> >>>>>>>>>> code currently throws an error if this is attempted.
> >>>>>>>>>>
> >>>>>>>>>> I'm happy to contribute the code on Github if there is interest?
> >>>>>>>>>> Comments on the handling of feature locations highly welcome!
> >>>>>>>>>>
> >>>>>>>>>> Frank
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>
>
>



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach
In reply to this post by Frank Schwach
Hi Chris,

Thanks for merging my pull request. I have a version now tha allows
using clone as an option. Would that be of interest too?

Frank

On 11/01/12 21:03, Frank Schwach wrote:

> Great, I'll work on a branch that gives the user the option to use
> clone instead of new and then we can see if we want to use that in the
> end. In the meantime, what do you think about pulling this into
> bioperl-live? When I have some time again I can work on the HOWTO for
> these new features for the BioPerl wiki
>
> Frank
>
>
> On 11/01/12 18:42, Fields, Christopher J wrote:
>> Note that Bio::Root::Root now has a clone() method that one can take
>> advantage of for this purpose; if Storable or Clone is available, it
>> will pick one of the two, preferably Clone over Storable.  It's
>> fairly untested, but we haven't run into problems with it yet (I
>> think it was in the last CPAN release).
>>
>> chris
>>
>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
>>
>>> Hi Frank,
>>>
>>> Looks great, I like the use of between locations, didn't think of that.
>>>
>>> It was suggested that I avoid using Clone for cat,
>>> trunc_with_features etc. to avoid adding a dependency (which may no
>>> longer be an issue) and because it would cause problems for Bio::Seq
>>> implementations that use a database as the back-end. Maybe you could
>>> add it as an option, but keep the default as is?
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 11/01/2012 18:16, Frank Schwach wrote:
>>>> Hi Roy and Chris,
>>>>
>>>> I have made the changes to the code now. As you suggested, feature
>>>> ends
>>>> no longer change type and I insert a note instead to inform about the
>>>> deletion (or insertion), showing the length and position.
>>>> I have also added a feature to annotate deletion sites themselves
>>>> (with
>>>> IN-BETWEEN locations).
>>>>
>>>> Roy's test script now prints:
>>>>
>>>> LOCUS       seq-accession_number            7 bp    dna    
>>>> linear   UNK
>>>> ACCESSION   unknown
>>>> FEATURES             Location/Qualifiers
>>>>        CDS             join(2..3,4..6)
>>>>                        /note="3bp internal deletion between pos 3
>>>> and 4"
>>>>        CDS             2..3
>>>>                        /note="2bp deleted from feature end"
>>>>        misc_feature    3^4
>>>>                        /note="deletion of 3bp"
>>>> ORIGIN
>>>>           1 aaaaaaa
>>>> //
>>>>
>>>>
>>>> or, if you add strand information (-1 in this case) to the second
>>>> feature:
>>>>
>>>> LOCUS       seq-accession_number            7 bp    dna    
>>>> linear   UNK
>>>> ACCESSION   unknown
>>>> FEATURES             Location/Qualifiers
>>>>        CDS             join(2..3,4..6)
>>>>                        /note="3bp internal deletion between pos 3
>>>> and 4"
>>>>        CDS             complement(2..3)
>>>>                        /note="2bp deleted from feature 5' end"
>>>>        misc_feature    3^4
>>>>                        /note="deletion of 3bp"
>>>> ORIGIN
>>>>           1 aaaaaaa
>>>> //
>>>>
>>>> I have comitted this along with some bugfixes to my master branch
>>>> on GitHub
>>>> https://github.com/fschwach/bioperl-live
>>>> so it's now also in my existing pull request.
>>>>
>>>> I'm still wondering if cloning the sequence objects rather than
>>>> calling
>>>> 'new' on their respective classes would be an option inside
>>>> 'delete' and
>>>> 'insert'?
>>>> I'm experimenting with this for my own purposes because I have to work
>>>> with custom sub-classes of Bio::Seq which have additional
>>>> attributes and
>>>> therefore set 'can_call_new' to false.
>>>> Without cloning the objects, I first have to convert the custom
>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to
>>>> avoid.
>>>> Is there any reason why something like Clone::Fast should not be
>>>> used in
>>>> this case? It seems to work for me but there may be situations where
>>>> this is going to blow up which I am not aware of.
>>>> Cloning rather than calling new could be made an option in
>>>> Bio::SeqUtils. I have most of the code for that already.
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>>>> Or without the typo:
>>>>>
>>>>> CDS             join(2..3,4..6)
>>>>>                  /note="3 bp internal deletion"
>>>>> CDS             2..3
>>>>>                  /note="2 bp deleted from 3' end"
>>>>>
>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>>>> I think it's me that didn't explain very well - I was talking about
>>>>>> overlapping (rather than spanning) a deletion, although I think
>>>>>> the same
>>>>>> principle applies to the spanning example you gave. Here's some test
>>>>>> code:
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>> use warnings FATAL=>qw(all);
>>>>>> use strict;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::SeqUtils;
>>>>>> use Bio::SeqFeature::Generic;
>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>
>>>>>>                                                       -start=>2,
>>>>>>                                                       -end=>9));
>>>>>>
>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>
>>>>>>                                                       -start=>2,
>>>>>>                                                       -end=>5));
>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>>>> print $out $trunc;
>>>>>>
>>>>>>
>>>>>> This currently outputs:
>>>>>> LOCUS       seq-accession_number            7 bp    dna    
>>>>>> linear   UNK
>>>>>> ACCESSION   unknown
>>>>>> FEATURES             Location/Qualifiers
>>>>>>         CDS             join(2..>3,<4..6)
>>>>>>         CDS             2..>3
>>>>>> ORIGIN
>>>>>>            1 aaaaaaa
>>>>>> //
>>>>>>
>>>>>> However, I was suggesting that the feature table should be something
>>>>>> like:
>>>>>> CDS             join(2..3,4..6)
>>>>>>                    /note="3 bp internal deletion"
>>>>>> CDS             join(2..3)
>>>>>>                    /note="2 bp deleted from 3' end"
>>>>>>
>>>>>> Fuzzy locations are intended to represent features which have
>>>>>> boundaries
>>>>>> spanning outside of the sequence. For a defined deletion that's
>>>>>> not the
>>>>>> case, the boundaries of the feature aren't unknown, they have been
>>>>>> specifically altered.
>>>>>>
>>>>>> Hope this is clearer.
>>>>>> Cheers,
>>>>>> Roy.
>>>>>>
>>>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>>>> Hi Roy,
>>>>>>>
>>>>>>> Sorry, I hadn't explained that very well: it's not the outer
>>>>>>> boundaries
>>>>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>>>>> locations:
>>>>>>>
>>>>>>>     --------------------           a feature's location
>>>>>>> ==========xxxx================= sequence
>>>>>>>
>>>>>>>
>>>>>>>     ---------                     sublocation 1
>>>>>>>              --------             sublocation 2
>>>>>>> ===============================
>>>>>>>
>>>>>>> x= sequence to delete
>>>>>>> The feature's location has changed from Simple to Split.
>>>>>>>
>>>>>>> Sublocation 1:
>>>>>>> start is still EXACT and has not changed
>>>>>>> end is now AFTER because this is not a true end of the feature
>>>>>>>
>>>>>>> Sublocation 2:
>>>>>>> start is BEFORE
>>>>>>> end is EXACT (but shifted)
>>>>>>>
>>>>>>> I hope this makes more sense(?)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>>>>> Hi Frank,
>>>>>>>>
>>>>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>>>>> trunc_with_features because it's intended to represent a taking a
>>>>>>>> subregion of a larger sequence, but if you're representing an
>>>>>>>> internal
>>>>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>>>>> unknown,
>>>>>>>> they have been specifically altered. Maybe you could give absolute
>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end
>>>>>>>> has been
>>>>>>>> truncated by however many bases.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Roy.
>>>>>>>>
>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> I have made the changes in a Git fork and made the pull
>>>>>>>>> request now.
>>>>>>>>> If this is accepted into BioPerl I can also write a little
>>>>>>>>> SeqUtils
>>>>>>>>> HOWTO for the BioPerl wiki.
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>>>>
>>>>>>>>>> chris
>>>>>>>>>>
>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>>>>> sequence
>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to
>>>>>>>>>>> cut a
>>>>>>>>>>> vector
>>>>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>>>>> annotations and
>>>>>>>>>>> moving the features accordingly.
>>>>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>>>>> sites in
>>>>>>>>>>> a meaningful way, which can not be done with the currently
>>>>>>>>>>> availble
>>>>>>>>>>> methods.
>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>>>>> methods:
>>>>>>>>>>>
>>>>>>>>>>> delete
>>>>>>>>>>> ======
>>>>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>>>>> and types
>>>>>>>>>>> of locations of sequence features:
>>>>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>>>>> into
>>>>>>>>>>> Splits.
>>>>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>>>>> Fuzzy to
>>>>>>>>>>> indicate that their true start/end was lost.
>>>>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>> deletion.
>>>>>>>>>>>
>>>>>>>>>>> insert
>>>>>>>>>>> ======
>>>>>>>>>>> adds a Bio::Seq object into another one between specified
>>>>>>>>>>> insertion
>>>>>>>>>>> sites. This also affects the features on the recipient
>>>>>>>>>>> sequence:
>>>>>>>>>>> - locations of features that span the insertion site are
>>>>>>>>>>> split but
>>>>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>>>>> original
>>>>>>>>>>> feature is lost.
>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>> insertion.
>>>>>>>>>>>
>>>>>>>>>>> ligate
>>>>>>>>>>> ======
>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one
>>>>>>>>>>> or two
>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if
>>>>>>>>>>> required.
>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>>>>> insert in
>>>>>>>>>>> turn.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans
>>>>>>>>>>> the
>>>>>>>>>>> origin
>>>>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>>>>> anyway. The
>>>>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>>>>
>>>>>>>>>>> I'm happy to contribute the code on Github if there is
>>>>>>>>>>> interest?
>>>>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>>>>
>>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>
>


--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: additional methods for Bio::SeqUtils for in-silico cloning

Fields, Christopher J
Yes.  I don't see a pull request for it yet, though, let me know when it is ready.

chris

On Jan 13, 2012, at 3:47 PM, Frank Schwach wrote:

> Hi Chris,
>
> Thanks for merging my pull request. I have a version now tha allows using clone as an option. Would that be of interest too?
>
> Frank
>
> On 11/01/12 21:03, Frank Schwach wrote:
>> Great, I'll work on a branch that gives the user the option to use clone instead of new and then we can see if we want to use that in the end. In the meantime, what do you think about pulling this into bioperl-live? When I have some time again I can work on the HOWTO for these new features for the BioPerl wiki
>>
>> Frank
>>
>>
>> On 11/01/12 18:42, Fields, Christopher J wrote:
>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable.  It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release).
>>>
>>> chris
>>>
>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
>>>
>>>> Hi Frank,
>>>>
>>>> Looks great, I like the use of between locations, didn't think of that.
>>>>
>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is?
>>>>
>>>> Cheers,
>>>> Roy.
>>>>
>>>> On 11/01/2012 18:16, Frank Schwach wrote:
>>>>> Hi Roy and Chris,
>>>>>
>>>>> I have made the changes to the code now. As you suggested, feature ends
>>>>> no longer change type and I insert a note instead to inform about the
>>>>> deletion (or insertion), showing the length and position.
>>>>> I have also added a feature to annotate deletion sites themselves (with
>>>>> IN-BETWEEN locations).
>>>>>
>>>>> Roy's test script now prints:
>>>>>
>>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>>>> ACCESSION   unknown
>>>>> FEATURES             Location/Qualifiers
>>>>>       CDS             join(2..3,4..6)
>>>>>                       /note="3bp internal deletion between pos 3 and 4"
>>>>>       CDS             2..3
>>>>>                       /note="2bp deleted from feature end"
>>>>>       misc_feature    3^4
>>>>>                       /note="deletion of 3bp"
>>>>> ORIGIN
>>>>>          1 aaaaaaa
>>>>> //
>>>>>
>>>>>
>>>>> or, if you add strand information (-1 in this case) to the second feature:
>>>>>
>>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>>>> ACCESSION   unknown
>>>>> FEATURES             Location/Qualifiers
>>>>>       CDS             join(2..3,4..6)
>>>>>                       /note="3bp internal deletion between pos 3 and 4"
>>>>>       CDS             complement(2..3)
>>>>>                       /note="2bp deleted from feature 5' end"
>>>>>       misc_feature    3^4
>>>>>                       /note="deletion of 3bp"
>>>>> ORIGIN
>>>>>          1 aaaaaaa
>>>>> //
>>>>>
>>>>> I have comitted this along with some bugfixes to my master branch on GitHub
>>>>> https://github.com/fschwach/bioperl-live
>>>>> so it's now also in my existing pull request.
>>>>>
>>>>> I'm still wondering if cloning the sequence objects rather than calling
>>>>> 'new' on their respective classes would be an option inside 'delete' and
>>>>> 'insert'?
>>>>> I'm experimenting with this for my own purposes because I have to work
>>>>> with custom sub-classes of Bio::Seq which have additional attributes and
>>>>> therefore set 'can_call_new' to false.
>>>>> Without cloning the objects, I first have to convert the custom
>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
>>>>> Is there any reason why something like Clone::Fast should not be used in
>>>>> this case? It seems to work for me but there may be situations where
>>>>> this is going to blow up which I am not aware of.
>>>>> Cloning rather than calling new could be made an option in
>>>>> Bio::SeqUtils. I have most of the code for that already.
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>>>>> Or without the typo:
>>>>>>
>>>>>> CDS             join(2..3,4..6)
>>>>>>                 /note="3 bp internal deletion"
>>>>>> CDS             2..3
>>>>>>                 /note="2 bp deleted from 3' end"
>>>>>>
>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>>>>> I think it's me that didn't explain very well - I was talking about
>>>>>>> overlapping (rather than spanning) a deletion, although I think the same
>>>>>>> principle applies to the spanning example you gave. Here's some test
>>>>>>> code:
>>>>>>>
>>>>>>> #!/usr/bin/perl
>>>>>>> use warnings FATAL=>qw(all);
>>>>>>> use strict;
>>>>>>> use Bio::Seq;
>>>>>>> use Bio::SeqIO;
>>>>>>> use Bio::SeqUtils;
>>>>>>> use Bio::SeqFeature::Generic;
>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>>                                                      -start=>2,
>>>>>>>                                                      -end=>9));
>>>>>>>
>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>>                                                      -start=>2,
>>>>>>>                                                      -end=>5));
>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>>>>> print $out $trunc;
>>>>>>>
>>>>>>>
>>>>>>> This currently outputs:
>>>>>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>>>>>> ACCESSION   unknown
>>>>>>> FEATURES             Location/Qualifiers
>>>>>>>        CDS             join(2..>3,<4..6)
>>>>>>>        CDS             2..>3
>>>>>>> ORIGIN
>>>>>>>           1 aaaaaaa
>>>>>>> //
>>>>>>>
>>>>>>> However, I was suggesting that the feature table should be something
>>>>>>> like:
>>>>>>> CDS             join(2..3,4..6)
>>>>>>>                   /note="3 bp internal deletion"
>>>>>>> CDS             join(2..3)
>>>>>>>                   /note="2 bp deleted from 3' end"
>>>>>>>
>>>>>>> Fuzzy locations are intended to represent features which have boundaries
>>>>>>> spanning outside of the sequence. For a defined deletion that's not the
>>>>>>> case, the boundaries of the feature aren't unknown, they have been
>>>>>>> specifically altered.
>>>>>>>
>>>>>>> Hope this is clearer.
>>>>>>> Cheers,
>>>>>>> Roy.
>>>>>>>
>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>>>>> Hi Roy,
>>>>>>>>
>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>>>>>> locations:
>>>>>>>>
>>>>>>>>    --------------------           a feature's location
>>>>>>>> ==========xxxx================= sequence
>>>>>>>>
>>>>>>>>
>>>>>>>>    ---------                     sublocation 1
>>>>>>>>             --------             sublocation 2
>>>>>>>> ===============================
>>>>>>>>
>>>>>>>> x= sequence to delete
>>>>>>>> The feature's location has changed from Simple to Split.
>>>>>>>>
>>>>>>>> Sublocation 1:
>>>>>>>> start is still EXACT and has not changed
>>>>>>>> end is now AFTER because this is not a true end of the feature
>>>>>>>>
>>>>>>>> Sublocation 2:
>>>>>>>> start is BEFORE
>>>>>>>> end is EXACT (but shifted)
>>>>>>>>
>>>>>>>> I hope this makes more sense(?)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>>>>>> Hi Frank,
>>>>>>>>>
>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>>>>>> trunc_with_features because it's intended to represent a taking a
>>>>>>>>> subregion of a larger sequence, but if you're representing an internal
>>>>>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>>>>>> unknown,
>>>>>>>>> they have been specifically altered. Maybe you could give absolute
>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>>>>>>> truncated by however many bases.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Roy.
>>>>>>>>>
>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>>>>>> Hi Chris,
>>>>>>>>>>
>>>>>>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>>>>>>> HOWTO for the BioPerl wiki.
>>>>>>>>>>
>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>>>>>
>>>>>>>>>>> chris
>>>>>>>>>>>
>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>>>>>> sequence
>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>>>>>>> vector
>>>>>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>>>>>> annotations and
>>>>>>>>>>>> moving the features accordingly.
>>>>>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>>>>>> sites in
>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>>>>>>> methods.
>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>>>>>> methods:
>>>>>>>>>>>>
>>>>>>>>>>>> delete
>>>>>>>>>>>> ======
>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>>>>>> and types
>>>>>>>>>>>> of locations of sequence features:
>>>>>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>>>>>> into
>>>>>>>>>>>> Splits.
>>>>>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>>>>>> Fuzzy to
>>>>>>>>>>>> indicate that their true start/end was lost.
>>>>>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>>> deletion.
>>>>>>>>>>>>
>>>>>>>>>>>> insert
>>>>>>>>>>>> ======
>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>>>>>>> - locations of features that span the insertion site are split but
>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>>>>>> original
>>>>>>>>>>>> feature is lost.
>>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>>> insertion.
>>>>>>>>>>>>
>>>>>>>>>>>> ligate
>>>>>>>>>>>> ======
>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>>>>>> insert in
>>>>>>>>>>>> turn.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>>>>>>> origin
>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>>>>>> anyway. The
>>>>>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>>>>>
>>>>>>>>>>>> Frank
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>
>>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
12