Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Bio-Phylo CPAN distribution.

Report information
The Basics
Id: 43821
Status: resolved
Priority: 0/
Queue: Bio-Phylo

People
Owner: Nobody in particular
Requestors: florent.angly [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.17_RC6
Fixed in: (no value)



Subject: Tree drawing is slow for large trees
Hi Rutger, First of all, thanks for looking at my previous bug reports and suggestions. Seeing how the last version of Bio::Phylo was from 2007, I feared that you weren't maintaining the module anymore. Anyways, I have a big Newick tree with over 2000 nodes that I plot. But it takes ~20 minutes to do that, which is quite long. So, I ran a test script (attached) with the above-mentioned tree (attached) and profiled it with NYTProf (results attached). It's very clear that for large tree, all the time is spent in the function Bio::Phylo::Mediators::NodeMediators::get_link. In particular that piece of code near line 473 that looks for the parent of a given node: for my $tuple ( @{$function} ) { if ( $tuple->[0] == $id ) { return $node_object_for_id{ $tuple->[1] }; } } By inspection, I figured out that @{$function} is an array of all (node, parent) tuples, and the code goes through each pair to find who the parent of the node is. It seems that method doesn't scale up very well. The more nodes, the more tuples, the larger the number of iterations before finding the parent. I would like to suggest that maybe you use hashes instead of arrays to store the parent-child information. Doing so might take a little bit more of memory and time upfront, but you would be able to query extremely quick who is the parent of a given node (no loop needed!). I illustrated this approach in this snippet of code: my %parentof = ( 'node10' => 'node1', 'node123' => 'node0' ); my $node = 'node123'; my $parent = $parentof{$node}; print "The parent of node $node is $parent\n"; I realize that in the module, you would probably have to create not one but several hashes to save all of the different types of relations between nodes: parent, sister, ... I think it's totally worth it. What do you think? Regards, Florent
Subject: nytprof.tar.gz
Download nytprof.tar.gz
application/gzip 1.3m

Message body not shown because it is not plain text.

Subject: VPT.tre
Download VPT.tre
application/octet-stream 49.3k

Message body not shown because it is not plain text.

Subject: tree_drawing_bench.pl
#! /usr/bin/env perl print "Starting\n"; use strict; use warnings; use Bio::Phylo::IO; use Bio::Phylo::Treedrawer; my $in_file = 'VPT.tre'; print "Test\n"; # Function parameters my $in_format = 'newick'; # [newick] my $out_format = 'svg'; # [svg] my $shape = 'diag'; # [rect|diag|curvy] my $mode = 'phylo'; # [clado|phylo] # clado is slooooow... my ($width, $height, $padding, $text_vert_offset, $text_horiz_offset, $text_width, $font_height, $stroke_width, $line_spacing) = (24279.38, 30696.38, 1458.19, 3, 5, 548, 9, 2, 0.5); # Parse tree print "Parsing tree...\n"; my $tree = Bio::Phylo::IO->parse('-format' => $in_format, '-file' => $in_file)->first; print " done!\n"; # Clean / prepare tree #$tree = remove_ancestor_names($tree); #$tree = leaves_taxid_to_names($tree, $taxo_names); # Create tree print "Preparing SVG...\n"; my $treedrawer = Bio::Phylo::Treedrawer->new( -width => $width, -height => $height, -padding => $padding, -text_vert_offset => $text_vert_offset, -text_horiz_offset => $text_horiz_offset, -text_width => $text_width, -shape => $shape, -mode => $mode, -format => $out_format ); print " done!\n"; print "Setting tree...\n"; $treedrawer->set_tree($tree); print " done!\n"; print "Drawing tree...\n"; #my $count = 1000; my $count = 1; for (my $i = 1; $i <= $count; $i++) { my $svg_tree = $treedrawer->draw; } print " done!\n";
Hi Rutger, Sorry I did not realize that you had significantly changed the internals of Bio::Phylo to make the nodes inherit from Bio::Phylo::Listable in version 0.17 RC8. The same tree drawing in 0.17 RC8 took less than 1 minute (compared to ~20 min in 0.17 RC6). What an improvement! Thanks for the good work. I suppose this bug can be closed.
Resolved in 0.17 RC8