BP split progress and rationale

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BP split progress and rationale

Mark A. Jensen
All,

I've made some significant progress towards a BP split. I know there
have been several tries, but I'm willing to take this one to an
actionable endpoint with YAPC::NA 2016 as a goal date for action.

I have built a graph of all the module dependencies (parent-child and
horizontal) in Neo4j, and have been using this to design module
groupings that encompass functional areas and also have hierarchical
group dependencies such that the dependencies between groups are
minimized. I'm calling the groupings "packages".

I am using the loose convention that "monophyletic" packages (groups of
modules that fall within a namespace) are named after the namespace, and
"polyphyletic" packages are named "BioPerl::<functional name>". The
following packages are currently pretty solid. The descriptions indicate
mainly what is encompassed by the contained modules, not rules for
membership.

BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e.,
many Bio::*I, Bio::Factory::*, Build helper classes.)

BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do
without annotations (e.g., fasta)

BioPerl::Alignment - alignment objects and parsers

BioPerl::Annotation - most annotation modules

BioPerl::SeqFeature - most SeqFeature modules

BioPerl::Tree - most Tree related modules

BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces

BioPerl::Search - The blast parsing and tiling

There are quite a few more. Examples of the logic: BioPerl::Base
contains all of its dependencies. BioPerl::Sequence requires only
BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment
requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires
Base, Sequence, and SeqFeature. And so on.

With a structure like this, a user who just needs Bio::PrimarySeq and
Bio::SeqIO to read some fasta files can get away with installing
BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to
the full 805 modules, including that broadly useful one
"Bio::DB::HIV::HIVQueryHelper".

Once finished, I'll propose setting many of the namespaces free as
separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others.
These can be packaged with their appropriate BioPerl::* prerequisites in
the metadata. I expect this will allow natural selection to operate much
more efficiently on the obsolete modules.

I will set up CPAN::Meta compliant metadata for everything.

I have more thoughts but this is already too long.

MAJ




_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: BP split progress and rationale

Brian Osborne-2
Mark,

I don’t understand. Last year I put Bio::Root* back into bioperl-live, to simplify installation. Now we are splitting again?

IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be separate. Generally people install BioPerl to get IO and basic Sequence object functionality. Why would Bio::Root (always required) be separate from things like Bio::Seq and SeqIO (always requested)?

Simplicity, please. BioPerl has very few people actively engaged these days, and the numbers there are steadily dropping. Everything we do should be geared towards simplicity and efficiency. Another example: SeqFeature and Annotation. Why separate them? They are almost always used together.

Then there’s the maintenance, and documentation. Please don’t take this personally MAJ, this business about splitting everything up is an old idea, an unquestioned assumption. Time to re-consider it.

Brian O.



> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <[hidden email]> wrote:
>
> All,
>
> I've made some significant progress towards a BP split. I know there have been several tries, but I'm willing to take this one to an actionable endpoint with YAPC::NA 2016 as a goal date for action.
>
> I have built a graph of all the module dependencies (parent-child and horizontal) in Neo4j, and have been using this to design module groupings that encompass functional areas and also have hierarchical group dependencies such that the dependencies between groups are minimized. I'm calling the groupings "packages".
>
> I am using the loose convention that "monophyletic" packages (groups of modules that fall within a namespace) are named after the namespace, and "polyphyletic" packages are named "BioPerl::<functional name>". The following packages are currently pretty solid. The descriptions indicate mainly what is encompassed by the contained modules, not rules for membership.
>
> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., many Bio::*I, Bio::Factory::*, Build helper classes.)
>
> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do without annotations (e.g., fasta)
>
> BioPerl::Alignment - alignment objects and parsers
>
> BioPerl::Annotation - most annotation modules
>
> BioPerl::SeqFeature - most SeqFeature modules
>
> BioPerl::Tree - most Tree related modules
>
> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>
> BioPerl::Search - The blast parsing and tiling
>
> There are quite a few more. Examples of the logic: BioPerl::Base contains all of its dependencies. BioPerl::Sequence requires only BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires Base, Sequence, and SeqFeature. And so on.
>
> With a structure like this, a user who just needs Bio::PrimarySeq and Bio::SeqIO to read some fasta files can get away with installing BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to the full 805 modules, including that broadly useful one "Bio::DB::HIV::HIVQueryHelper".
>
> Once finished, I'll propose setting many of the namespaces free as separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. These can be packaged with their appropriate BioPerl::* prerequisites in the metadata. I expect this will allow natural selection to operate much more efficiently on the obsolete modules.
>
> I will set up CPAN::Meta compliant metadata for everything.
>
> I have more thoughts but this is already too long.
>
> MAJ
>
>
>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: BP split progress and rationale

Mark A. Jensen
Wow, Brian,
"Generally people install BioPerl to get IO and basic functionality"?
Generally, people (or, I) wouldn't think of installing BioPerl for basic
functionality, because people (or I) get 805 modules, most obsolete, in
order to use 3, after waiting 15-20min for the tests to complete. At
least, I've sensed significant frustration in many of the posts relating
to installation on this list. I agree, everything should be geared
toward simplicity and efficiency, but for the user.

The base set would always be installed. The installation of the
sequence set would pull in the base set. There is no need to divide the
repos, this can all be driven by metadata - in CPAN::Meta format, so
that any CPAN distribution tool could actually pick out what is
necessary for a particular user's needs and install them. The bloat is
managed by managing the groupings, not the repositories. Sure, there
would maintenance and documentation, same as in living projects. Maybe
new people would get interested if the work could be divided among many
functional units. And maybe the unused hundreds of modules would whither
as they should. Or, maybe you're right, time for BioPerl to ride into
Valhalla.


On 2016-06-01 08:37, Brian Osborne wrote:

> Mark,
>
> I don’t understand. Last year I put Bio::Root* back into
> bioperl-live, to simplify installation. Now we are splitting again?
>
> IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be
> separate. Generally people install BioPerl to get IO and basic
> Sequence object functionality. Why would Bio::Root (always required)
> be separate from things like Bio::Seq and SeqIO (always requested)?
>
> Simplicity, please. BioPerl has very few people actively engaged
> these days, and the numbers there are steadily dropping. Everything
> we
> do should be geared towards simplicity and efficiency. Another
> example: SeqFeature and Annotation. Why separate them? They are
> almost
> always used together.
>
> Then there’s the maintenance, and documentation. Please don’t take
> this personally MAJ, this business about splitting everything up is
> an
> old idea, an unquestioned assumption. Time to re-consider it.
>
> Brian O.
>
>
>
>> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <[hidden email]>
>> wrote:
>>
>> All,
>>
>> I've made some significant progress towards a BP split. I know there
>> have been several tries, but I'm willing to take this one to an
>> actionable endpoint with YAPC::NA 2016 as a goal date for action.
>>
>> I have built a graph of all the module dependencies (parent-child
>> and horizontal) in Neo4j, and have been using this to design module
>> groupings that encompass functional areas and also have hierarchical
>> group dependencies such that the dependencies between groups are
>> minimized. I'm calling the groupings "packages".
>>
>> I am using the loose convention that "monophyletic" packages (groups
>> of modules that fall within a namespace) are named after the
>> namespace, and "polyphyletic" packages are named "BioPerl::<functional
>> name>". The following packages are currently pretty solid. The
>> descriptions indicate mainly what is encompassed by the contained
>> modules, not rules for membership.
>>
>> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e.,
>> many Bio::*I, Bio::Factory::*, Build helper classes.)
>>
>> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can
>> do without annotations (e.g., fasta)
>>
>> BioPerl::Alignment - alignment objects and parsers
>>
>> BioPerl::Annotation - most annotation modules
>>
>> BioPerl::SeqFeature - most SeqFeature modules
>>
>> BioPerl::Tree - most Tree related modules
>>
>> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>>
>> BioPerl::Search - The blast parsing and tiling
>>
>> There are quite a few more. Examples of the logic: BioPerl::Base
>> contains all of its dependencies. BioPerl::Sequence requires only
>> BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment
>> requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires
>> Base, Sequence, and SeqFeature. And so on.
>>
>> With a structure like this, a user who just needs Bio::PrimarySeq
>> and Bio::SeqIO to read some fasta files can get away with installing
>> BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to
>> the full 805 modules, including that broadly useful one
>> "Bio::DB::HIV::HIVQueryHelper".
>>
>> Once finished, I'll propose setting many of the namespaces free as
>> separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others.
>> These can be packaged with their appropriate BioPerl::* prerequisites
>> in the metadata. I expect this will allow natural selection to operate
>> much more efficiently on the obsolete modules.
>>
>> I will set up CPAN::Meta compliant metadata for everything.
>>
>> I have more thoughts but this is already too long.
>>
>> MAJ
>>
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: BP split progress and rationale

Brian Osborne-2
MAJ,

There must be something I’m not understanding, so let’s start this over. When I put Bio::Root back into bioperl-live last year, the only feedback I got was good, that splitting BioPerl into many parts was probably not a good idea. The balance may be in finding the correct number of parts, yes?

So - again, just so I understand - are you proposing to take Bio::Root out of bioperl-live again?

BIO



> On Jun 1, 2016, at 9:49 AM, Mark A. Jensen <[hidden email]> wrote:
>
> Wow, Brian,
> "Generally people install BioPerl to get IO and basic functionality"? Generally, people (or, I) wouldn't think of installing BioPerl for basic functionality, because people (or I) get 805 modules, most obsolete, in order to use 3, after waiting 15-20min for the tests to complete. At least, I've sensed significant frustration in many of the posts relating to installation on this list. I agree, everything should be geared toward simplicity and efficiency, but for the user.
>
> The base set would always be installed. The installation of the sequence set would pull in the base set. There is no need to divide the repos, this can all be driven by metadata - in CPAN::Meta format, so that any CPAN distribution tool could actually pick out what is necessary for a particular user's needs and install them. The bloat is managed by managing the groupings, not the repositories. Sure, there would maintenance and documentation, same as in living projects. Maybe new people would get interested if the work could be divided among many functional units. And maybe the unused hundreds of modules would whither as they should. Or, maybe you're right, time for BioPerl to ride into Valhalla.
>
>
> On 2016-06-01 08:37, Brian Osborne wrote:
>> Mark,
>>
>> I don’t understand. Last year I put Bio::Root* back into
>> bioperl-live, to simplify installation. Now we are splitting again?
>>
>> IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be
>> separate. Generally people install BioPerl to get IO and basic
>> Sequence object functionality. Why would Bio::Root (always required)
>> be separate from things like Bio::Seq and SeqIO (always requested)?
>>
>> Simplicity, please. BioPerl has very few people actively engaged
>> these days, and the numbers there are steadily dropping. Everything we
>> do should be geared towards simplicity and efficiency. Another
>> example: SeqFeature and Annotation. Why separate them? They are almost
>> always used together.
>>
>> Then there’s the maintenance, and documentation. Please don’t take
>> this personally MAJ, this business about splitting everything up is an
>> old idea, an unquestioned assumption. Time to re-consider it.
>>
>> Brian O.
>>
>>
>>
>>> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <[hidden email]> wrote:
>>>
>>> All,
>>>
>>> I've made some significant progress towards a BP split. I know there have been several tries, but I'm willing to take this one to an actionable endpoint with YAPC::NA 2016 as a goal date for action.
>>>
>>> I have built a graph of all the module dependencies (parent-child and horizontal) in Neo4j, and have been using this to design module groupings that encompass functional areas and also have hierarchical group dependencies such that the dependencies between groups are minimized. I'm calling the groupings "packages".
>>>
>>> I am using the loose convention that "monophyletic" packages (groups of modules that fall within a namespace) are named after the namespace, and "polyphyletic" packages are named "BioPerl::<functional name>". The following packages are currently pretty solid. The descriptions indicate mainly what is encompassed by the contained modules, not rules for membership.
>>>
>>> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., many Bio::*I, Bio::Factory::*, Build helper classes.)
>>>
>>> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do without annotations (e.g., fasta)
>>>
>>> BioPerl::Alignment - alignment objects and parsers
>>>
>>> BioPerl::Annotation - most annotation modules
>>>
>>> BioPerl::SeqFeature - most SeqFeature modules
>>>
>>> BioPerl::Tree - most Tree related modules
>>>
>>> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>>>
>>> BioPerl::Search - The blast parsing and tiling
>>>
>>> There are quite a few more. Examples of the logic: BioPerl::Base contains all of its dependencies. BioPerl::Sequence requires only BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires Base, Sequence, and SeqFeature. And so on.
>>>
>>> With a structure like this, a user who just needs Bio::PrimarySeq and Bio::SeqIO to read some fasta files can get away with installing BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to the full 805 modules, including that broadly useful one "Bio::DB::HIV::HIVQueryHelper".
>>>
>>> Once finished, I'll propose setting many of the namespaces free as separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. These can be packaged with their appropriate BioPerl::* prerequisites in the metadata. I expect this will allow natural selection to operate much more efficiently on the obsolete modules.
>>>
>>> I will set up CPAN::Meta compliant metadata for everything.
>>>
>>> I have more thoughts but this is already too long.
>>>
>>> MAJ
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: BP split progress and rationale

Fields, Christopher J
I think the Bio-Root split indicated there is definitely a certain threshold of pain involved w/ splitting out code.  The key reason I suggested adding it back was from a maintenance and user standpoint; it was a pain and probably unnecessary as the initial step into splitting out code (it did work, but at some cost).  

Saying that, I think there is a good middle-ground. A key complaint about bioperl is the installation process and the ton of dependencies for modules that see little use.  We work around these to some extent with ‘recommended’ dependencies, but it’s not the best solution in my opinion.  Maybe we should just hone in on the modules that have these additional downstream dependencies stifling installation and move them out, with the mind on keeping dependencies to an absolute minimum?  We already know what these modules are (e.g. the Build.PL file lists the dependencies).  

As a note: there has been some work on this already: Bio::SearchIO::blastxml resides in a separate repo now.  I would also suggest we keep Bio::Coordinate and a few other modules split out, there have been very few complaints.

chris

> On Jun 1, 2016, at 8:57 AM, Brian Osborne <[hidden email]> wrote:
>
> MAJ,
>
> There must be something I’m not understanding, so let’s start this over. When I put Bio::Root back into bioperl-live last year, the only feedback I got was good, that splitting BioPerl into many parts was probably not a good idea. The balance may be in finding the correct number of parts, yes?
>
> So - again, just so I understand - are you proposing to take Bio::Root out of bioperl-live again?
>
> BIO
>
>
>
>> On Jun 1, 2016, at 9:49 AM, Mark A. Jensen <[hidden email]> wrote:
>>
>> Wow, Brian,
>> "Generally people install BioPerl to get IO and basic functionality"? Generally, people (or, I) wouldn't think of installing BioPerl for basic functionality, because people (or I) get 805 modules, most obsolete, in order to use 3, after waiting 15-20min for the tests to complete. At least, I've sensed significant frustration in many of the posts relating to installation on this list. I agree, everything should be geared toward simplicity and efficiency, but for the user.
>>
>> The base set would always be installed. The installation of the sequence set would pull in the base set. There is no need to divide the repos, this can all be driven by metadata - in CPAN::Meta format, so that any CPAN distribution tool could actually pick out what is necessary for a particular user's needs and install them. The bloat is managed by managing the groupings, not the repositories. Sure, there would maintenance and documentation, same as in living projects. Maybe new people would get interested if the work could be divided among many functional units. And maybe the unused hundreds of modules would whither as they should. Or, maybe you're right, time for BioPerl to ride into Valhalla.
>>
>>
>> On 2016-06-01 08:37, Brian Osborne wrote:
>>> Mark,
>>>
>>> I don’t understand. Last year I put Bio::Root* back into
>>> bioperl-live, to simplify installation. Now we are splitting again?
>>>
>>> IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be
>>> separate. Generally people install BioPerl to get IO and basic
>>> Sequence object functionality. Why would Bio::Root (always required)
>>> be separate from things like Bio::Seq and SeqIO (always requested)?
>>>
>>> Simplicity, please. BioPerl has very few people actively engaged
>>> these days, and the numbers there are steadily dropping. Everything we
>>> do should be geared towards simplicity and efficiency. Another
>>> example: SeqFeature and Annotation. Why separate them? They are almost
>>> always used together.
>>>
>>> Then there’s the maintenance, and documentation. Please don’t take
>>> this personally MAJ, this business about splitting everything up is an
>>> old idea, an unquestioned assumption. Time to re-consider it.
>>>
>>> Brian O.
>>>
>>>
>>>
>>>> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <[hidden email]> wrote:
>>>>
>>>> All,
>>>>
>>>> I've made some significant progress towards a BP split. I know there have been several tries, but I'm willing to take this one to an actionable endpoint with YAPC::NA 2016 as a goal date for action.
>>>>
>>>> I have built a graph of all the module dependencies (parent-child and horizontal) in Neo4j, and have been using this to design module groupings that encompass functional areas and also have hierarchical group dependencies such that the dependencies between groups are minimized. I'm calling the groupings "packages".
>>>>
>>>> I am using the loose convention that "monophyletic" packages (groups of modules that fall within a namespace) are named after the namespace, and "polyphyletic" packages are named "BioPerl::<functional name>". The following packages are currently pretty solid. The descriptions indicate mainly what is encompassed by the contained modules, not rules for membership.
>>>>
>>>> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., many Bio::*I, Bio::Factory::*, Build helper classes.)
>>>>
>>>> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do without annotations (e.g., fasta)
>>>>
>>>> BioPerl::Alignment - alignment objects and parsers
>>>>
>>>> BioPerl::Annotation - most annotation modules
>>>>
>>>> BioPerl::SeqFeature - most SeqFeature modules
>>>>
>>>> BioPerl::Tree - most Tree related modules
>>>>
>>>> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>>>>
>>>> BioPerl::Search - The blast parsing and tiling
>>>>
>>>> There are quite a few more. Examples of the logic: BioPerl::Base contains all of its dependencies. BioPerl::Sequence requires only BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires Base, Sequence, and SeqFeature. And so on.
>>>>
>>>> With a structure like this, a user who just needs Bio::PrimarySeq and Bio::SeqIO to read some fasta files can get away with installing BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to the full 805 modules, including that broadly useful one "Bio::DB::HIV::HIVQueryHelper".
>>>>
>>>> Once finished, I'll propose setting many of the namespaces free as separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. These can be packaged with their appropriate BioPerl::* prerequisites in the metadata. I expect this will allow natural selection to operate much more efficiently on the obsolete modules.
>>>>
>>>> I will set up CPAN::Meta compliant metadata for everything.
>>>>
>>>> I have more thoughts but this is already too long.
>>>>
>>>> MAJ
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l