Re: module Bio::TreeIO

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Jason Stajich-3
Hi Valerie -

Please ask this on the mailing list it is better for everyone to hear  
and help with questions.

You want to do this:
my $treeout = Bio::TreeIO->new(-format => 'newick', -file =>  
">MYFILENAME.tre");


I guess we need to more clearly explain that the IO system in BioPerl  
for new people but it is the same idea for SeqIO, TreeIO, etc you  
specify a filename to write to just like you would when opening a  
filehandle in perl open($fh, ">OUTPUTNAME") or o::TreeIO->new(-format  
=> 'newick', -file => ">MYFILENAME.nh");
   Filenames and extensions are whatever you want them to be, no  
guessing of formats based on filename extensions.

Presumably you have seen the HOWTO as well
http://bioperl.org/wiki/HOWTO:Trees

If there are things that are unclear we'd appreciate you make that  
know by commenting on the Discussion page that is linked to it (tabs  
at the top of the page).

-jason
On Jan 8, 2009, at 9:30 AM, valerie storms wrote:

> Dear Jason,
>
> I would like to use the BioPerl modules to (1) construct a  
> phylogenetic tree from a distance matrix, (2) put this tree is the  
> Newick format
> and (3) save this tree in an output file.
> The first two steps (1,2) are fine by using  
> Bio::Tree::DistanceFactory and Bio::TreeIO!
> But I have no idea how I can get my tree saved in an output file  
> instaed of printed to the mean stream??
>
> Can you help me with this?
> Many Thanks in advance!
>
> Best regards,
>
> Storms Valerie
> Phd student
> KULeuven Belgium
>
>
> p.s. The code I use
>
>
> #!/usr/bin/perl -w
>
> my $myDEBUG = 1;
> use lib '/users/sista/vstorms/local/lib/perl5/';
> use Bio::Perl;
> use Bio::Tree::DistanceFactory;
> use Bio::TreeIO;
> use Bio::Tools::Phylo::Phylip::ProtDist;
>
> my $outfile_protdist = '/users/sista/vstorms/LEGENDO/motif_detection/
> inputfiles/selection2/distance_matrix.txt';
> my $tree_file = '/users/sista/vstorms/LEGENDO/motif_detection/
> inputfiles/selection2/Tree.txt';
> if (-e $tree_file){
>   my $rm = 'rm -f '.$tree_file;
>   system $rm;
> }   my $dist = Bio::Tools::Phylo::Phylip::ProtDist->new(
>                           -file=>"$outfile_protdist",
>                           -program=>"ProtDist");
> my $matrix = $dist->next_matrix;
> my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
> my $treein = Bio::TreeIO->new(-format => 'newick');
> my $treeout = Bio::TreeIO->new( -format => 'newick', -file =>  
> $tree_file);
> my $tree = $dfactory->make_tree($matrix);
> $treein->write_tree($tree);
> $treeout->write_tree($tree);
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>

Jason Stajich
[hidden email]



_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
I'm rehashing this old topic because it seems like the most relevant to my question.

After making a tree from a distance matrix, how would I calculate bootstrap values?  I have been shown that it can be done in R, but I would want a pure-perl method because the rest of my code revolves around BioPerl.  http://www.inside-r.org/packages/cran/ape/docs/boot.phylo 

I think that I can use Bio::Tree::Statistics to make 100 trees and then combine them with
assess_bootstrap

e.g.,

my @hundredRandomTrees=somehowMakeBootstrapTrees($distance_matrix,100);
my $stats=Bio::Tree::Statistics->new;
my $bs_tree = $stats->assess_bootstrap(\@hundredRandomTrees, $my_tree);

However, I would be making my own version of random or jack knifed trees and it wouldn't be as well validated as something centralized.  Before I go down this road... is there any kind of standardized method in BioPerl for making bootstrap values from a distance matrix?

Or if not, does anyone have a suggestion on how to make the 100 trees from the matrix?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

vitis
If I still remember my phylogenetics correctly, the bootstrapping process happens earlier, before a tree could be made. Bootstrapping is actually randomly resampling the characters used for tree construction. While TreeIO is only a tool to manipulate trees. You'll not only have to find some way to do resampling but also tree construction, which requires specialized applications.

Best.

Ke

On Mon, May 9, 2016 at 2:47 PM, lskatz <[hidden email]> wrote:
I'm rehashing this old topic because it seems like the most relevant to my
question.

After making a tree from a distance matrix, how would I calculate bootstrap
values?  I have been shown that it can be done in R, but I would want a
pure-perl method because the rest of my code revolves around BioPerl.
http://www.inside-r.org/packages/cran/ape/docs/boot.phylo

I think that I can use Bio::Tree::Statistics to make 100 trees and then
combine them with

e.g.,



However, I would be making my own version of random or jack knifed trees and
it wouldn't be as well validated as something centralized.  Before I go down
this road... is there any kind of standardized method in BioPerl for making
bootstrap values from a distance matrix?

Or if not, does anyone have a suggestion on how to make the 100 trees from
the matrix?



--
View this message in context: http://bioperl.996286.n3.nabble.com/Re-module-Bio-TreeIO-tp12257p17850.html
Sent from the Bioperl-L mailing list archive at Nabble.com.
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
That's what I thought originally too but the R package made it look like there might be some magic bootstrapping going on with distances alone.  If there truly is no distance-matrix bootstrapping algorithm then I am left to my devices and at least I can make use of assess_bootstrap().

By the way, if I can make a documentation bug report -- it looks like assess_bootstrap() only takes a list of trees as an argument but when I look at the source code it looks like it also can take a guide tree as a second parameter.  That second parameter is not in the documentation.  How would I specifically report that?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Roy Chaudhuri-3
Hi,

I can see why you were confused by the boot.phylo docs, it's not very
clear, but it is resampling columns from an alignment.

I've uploaded a script to GitHub which does a BioPerl-based bootstrap
analysis, maybe it will be useful for you.
https://github.com/RoyChaudhuri/bioperl-scripts/blob/master/boottree
However, I would note that BioPerl is certainly not the most efficient
way of doing this, and neighbor-joining is not the most robust method of
constructing a tree (more sophisticated approaches are available in
programs such as RAxML and MrBayes).

You're right about the assess_bootstrap docs. You could post this as an
issue on GitHub (or even better, fix it and submit a pull request):
https://github.com/bioperl/bioperl-live/issues

Cheers,
Roy.

On 10/05/2016 13:54, lskatz wrote:

> That's what I thought originally too but the R package made it look like
> there might be some magic bootstrapping going on with distances alone.  If
> there truly is no distance-matrix bootstrapping algorithm then I am left to
> my devices and at least I can make use of assess_bootstrap().
>
> By the way, if I can make a documentation bug report -- it looks like
> assess_bootstrap() only takes a list of trees as an argument but when I look
> at the source code it looks like it also can take a guide tree as a second
> parameter.  That second parameter is not in the documentation.  How would I
> specifically report that?
>
>
>
> --
> View this message in context: http://bioperl.996286.n3.nabble.com/Re-module-Bio-TreeIO-tp12257p17852.html
> Sent from the Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Fields, Christopher J
In reply to this post by lskatz
You can either submit a Github Issue or fork the repo, generate the document fix, then generate a pull request (we accept either, though pull requests make our life a lot easier :)

chris

> On May 10, 2016, at 7:54 AM, lskatz <[hidden email]> wrote:
>
> That's what I thought originally too but the R package made it look like
> there might be some magic bootstrapping going on with distances alone.  If
> there truly is no distance-matrix bootstrapping algorithm then I am left to
> my devices and at least I can make use of assess_bootstrap().
>
> By the way, if I can make a documentation bug report -- it looks like
> assess_bootstrap() only takes a list of trees as an argument but when I look
> at the source code it looks like it also can take a guide tree as a second
> parameter.  That second parameter is not in the documentation.  How would I
> specifically report that?
>
>
>
> --
> View this message in context: http://bioperl.996286.n3.nabble.com/Re-module-Bio-TreeIO-tp12257p17852.html
> Sent from the Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
Thanks!  I'll try to make a pull request soon, and I will look into ways I can make bootstrapping trees.  Thank you also for your code example.

I only have a distance matrix derived from Mash and so I need a program to make a tree from distances.  I do not think I can use RAxML or Mr. Bayes, right?  So BioPerl would be the best way to go.  Programs like Emboss's fneighbor had certain drawbacks like taxon character limits, but BioPerl doesn't.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Roy Chaudhuri-3
Ah, in that case you could use Bio::Tree::DistanceFactory to make a tree
from your distance matrix, but I don't see how you could do any
bootstrapping - unless you have a way of resampling the raw data so you
can generate a set of bootstrap-replicate distance matrices.

Cheers,
Roy.

On 10/05/2016 16:17, lskatz wrote:

> Thanks!  I'll try to make a pull request soon, and I will look into ways I
> can make bootstrapping trees.  Thank you also for your code example.
>
> I only have a distance matrix derived from Mash and so I need a program to
> make a tree from distances.  I do not think I can use RAxML or Mr. Bayes,
> right?  So BioPerl would be the best way to go.  Programs like Emboss's
> fneighbor had certain drawbacks like taxon character limits, but BioPerl
> doesn't.
>
>
>
> --
> View this message in context: http://bioperl.996286.n3.nabble.com/Re-module-Bio-TreeIO-tp12257p17855.html
> Sent from the Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Youensclark, Ken - (kyclark)
In reply to this post by lskatz
On May 10, 2016, at 5:54 AM, lskatz <[hidden email]> wrote:
>
> That's what I thought originally too but the R package made it look like
> there might be some magic bootstrapping going on with distances alone.  If
> there truly is no distance-matrix bootstrapping algorithm then I am left to
> my devices and at least I can make use of assess_bootstrap().

I'm following this thread with interest not because I have anything to add about the Perl but because I've been using MinHash in an all-vs-all sample comparison application that runs on the stampede cluster at TACC:

        https://github.com/hurwitzlab/stampede-mash

I'd love to know what all you're doing with a distance matrix as far as analysis and visualization.

Ken
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Fields, Christopher J
On May 10, 2016, at 10:51 AM, Youensclark, Ken - (kyclark) <[hidden email]> wrote:

>
> On May 10, 2016, at 5:54 AM, lskatz <[hidden email]> wrote:
>>
>> That's what I thought originally too but the R package made it look like
>> there might be some magic bootstrapping going on with distances alone.  If
>> there truly is no distance-matrix bootstrapping algorithm then I am left to
>> my devices and at least I can make use of assess_bootstrap().
>
> I'm following this thread with interest not because I have anything to add about the Perl but because I've been using MinHash in an all-vs-all sample comparison application that runs on the stampede cluster at TACC:
>
> https://github.com/hurwitzlab/stampede-mash
>
> I'd love to know what all you're doing with a distance matrix as far as analysis and visualization.
>
> Ken

Nice, we’ve been looking into mash as well for the same reasons.  Nothing to report yet unfortunately (we haven’t even started).

chris


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Andreas Leimbach-2
just realized I didn't reply to list earlier, so here are my two cents.

well, if you're willing to go outside Perl, there are plenty of options
to cluster distances. E.g. R's `hclust` function with many hierarchical
clustering algorithms (complete, average etc.)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html

For examples with dendrograms (and package `ape`) see:
https://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html

Use the R package `amap` for parallellization:
https://cran.r-project.org/web/packages/amap/index.html


BioPython has also most of the distance/clustering functions of R
implemented:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc234

Nice Python examples, I think, are integrated in pyani from Leighton
Pritchard:
https://github.com/widdowquinn/pyani

I'm sure there are also Perl packages for clustering out there, just
don't know about them.

HTH,
Andreas

--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 33843
E-Mail: [hidden email]

On 10.05.2016 20:08, Fields, Christopher J wrote:

> On May 10, 2016, at 10:51 AM, Youensclark, Ken - (kyclark) <[hidden email]> wrote:
>>
>> On May 10, 2016, at 5:54 AM, lskatz <[hidden email]> wrote:
>>>
>>> That's what I thought originally too but the R package made it look like
>>> there might be some magic bootstrapping going on with distances alone.  If
>>> there truly is no distance-matrix bootstrapping algorithm then I am left to
>>> my devices and at least I can make use of assess_bootstrap().
>>
>> I'm following this thread with interest not because I have anything to add about the Perl but because I've been using MinHash in an all-vs-all sample comparison application that runs on the stampede cluster at TACC:
>>
>> https://github.com/hurwitzlab/stampede-mash
>>
>> I'd love to know what all you're doing with a distance matrix as far as analysis and visualization.
>>
>> Ken
>
> Nice, we’ve been looking into mash as well for the same reasons.  Nothing to report yet unfortunately (we haven’t even started).
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
In reply to this post by Fields, Christopher J
I think I'll have a multithreaded script this week using Mash v1.1 and BioPerl, and I will have it on github.  Pull requests will be encouraged.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
Never mind, I finished a first draft faster than I thought I would.  Please send me any pull requests.  I think this will be a really useful script.

https://github.com/lskatz/lskScripts/blob/master/mashtree.pl

Pull requests for mashtree.pl are encouraged.  Please be kind to my code and please keep it commented :)

My first speed test with 20 fastq.gz files, with 10 bootstraps (and one tree) was 94 seconds with 32 cpus.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Torsten Seemann
In reply to this post by lskatz
I only have a distance matrix derived from Mash and so I need a program to
make a tree from distances.  I do not think I can use RAxML or Mr. Bayes,
right? 

The PHYLIP format can be used for distance matrices, and PHYLIP has various tools to build trees from them:


The MASH dist output file is VERY close to being a valid .PHY file (If taxnames <= 8 chars) but Brian Ondov seems unwilling to implement it within MASH - see my issue here: https://github.com/marbl/Mash/issues/9

Torst
 

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Liam Elbourne-3
I think it is 10 characters (sequence starts at character 11), not that two characters is worth quibbling about, I just map the original names to hex numbers (which I think is likely to cover the number of sequences one can realistically align…), and then swap out the hex codes when I’ve finished with the alignment/tree etc, for what that is worth…

Liam.





On 11 May 2016, at 9:00 AM, Torsten Seemann <[hidden email]> wrote:

I only have a distance matrix derived from Mash and so I need a program to
make a tree from distances.  I do not think I can use RAxML or Mr. Bayes,
right? 

The PHYLIP format can be used for distance matrices, and PHYLIP has various tools to build trees from them:


The MASH dist output file is VERY close to being a valid .PHY file (If taxnames <= 8 chars) but Brian Ondov seems unwilling to implement it within MASH - see my issue here: https://github.com/marbl/Mash/issues/9

Torst
 
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

signature.asc (242 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Fields, Christopher J
I think this functionality (mapping from long name to short PHYLIP-compatible and back) is in bioperl.  Would have to dig it up but I recall support being added many moons ago.

chris

On May 10, 2016, at 8:30 PM, Liam Elbourne <[hidden email]> wrote:

I think it is 10 characters (sequence starts at character 11), not that two characters is worth quibbling about, I just map the original names to hex numbers (which I think is likely to cover the number of sequences one can realistically align…), and then swap out the hex codes when I’ve finished with the alignment/tree etc, for what that is worth…

Liam.





On 11 May 2016, at 9:00 AM, Torsten Seemann <[hidden email]> wrote:

I only have a distance matrix derived from Mash and so I need a program to
make a tree from distances.  I do not think I can use RAxML or Mr. Bayes,
right? 

The PHYLIP format can be used for distance matrices, and PHYLIP has various tools to build trees from them:


The MASH dist output file is VERY close to being a valid .PHY file (If taxnames <= 8 chars) but Brian Ondov seems unwilling to implement it within MASH - see my issue here: https://github.com/marbl/Mash/issues/9

Torst
 
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

Liam Elbourne-3
I’m sure it is, just a another case of reinventing the wheel taking less time than reading the manual….


Liam.



On 11 May 2016, at 12:18 PM, Fields, Christopher J <[hidden email]> wrote:

I think this functionality (mapping from long name to short PHYLIP-compatible and back) is in bioperl.  Would have to dig it up but I recall support being added many moons ago.

chris

On May 10, 2016, at 8:30 PM, Liam Elbourne <[hidden email]> wrote:

I think it is 10 characters (sequence starts at character 11), not that two characters is worth quibbling about, I just map the original names to hex numbers (which I think is likely to cover the number of sequences one can realistically align…), and then swap out the hex codes when I’ve finished with the alignment/tree etc, for what that is worth…

Liam.





On 11 May 2016, at 9:00 AM, Torsten Seemann <[hidden email]> wrote:

I only have a distance matrix derived from Mash and so I need a program to
make a tree from distances.  I do not think I can use RAxML or Mr. Bayes,
right? 

The PHYLIP format can be used for distance matrices, and PHYLIP has various tools to build trees from them:


The MASH dist output file is VERY close to being a valid .PHY file (If taxnames <= 8 chars) but Brian Ondov seems unwilling to implement it within MASH - see my issue here: https://github.com/marbl/Mash/issues/9

Torst
 
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l



_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

signature.asc (242 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: module Bio::TreeIO

lskatz
FYI, I graduated the script to a github repository so that it can get its own readme, issues page, etc.

https://github.com/lskatz/mashtree 
Loading...