Subject: | New version of Bio::Phylo unsustainably increases memory usage |
Date: | Tue, 26 Sep 2006 23:19:28 -0400 |
To: | bug-bio-phylo [...] rt.cpan.org |
From: | Allen Smith <easmith [...] beatrice.rutgers.edu> |
$Id: Phylo.pm 2196 2006-09-07 21:35:47Z rvosa $
I tried upgrading to the latest CPAN version of Bio::Phylo, 0.15, from 0.12,
and found that the result of the following code was now a massive increase
in memory usage - we're talking a difference from 10-20MB to close to a gig
by the time I stopped the program, and that was when it wasn't finished
yet. Note that I have a total of 291 trees. I'm suspecting part of the
problem may be that Bio::Phylo may be caching too much, or that it may be
assuming that all trees in usage by one program will be together in a
"forest" - the latter would not make sense for my usage. I'm reverting to an
earlier version (0.12). (The change from newick to fastnewick to newick, and
now back again to fastnewick with the reversion to an earlier version, is
going to also introduce some incompatibilities, which are highly
unfortunate; please do not do such name changes in the future (keep calling
it fastnewick, making a copy of it as newick if need be).)
foreach $tree (sort {($num_terminals{$a} <=> $num_terminals{$b}) ||
($a cmp $b)} (keys %overall_group_species_seen)) {
foreach $group (sort {(scalar(keys %{ $overall_group_species_seen{$tree}{$a} })
<=>
scalar(keys %{ $overall_group_species_seen{$tree}{$b} }))
|| ($a cmp $b)}
(keys %{ $overall_group_species_seen{$tree} })) {
my(@nodes) = ();
foreach $node (@{ $treebase_trees{$tree}->get_terminals }) {
if (defined($node->get_name) &&
exists($overall_group_species_seen{$tree}{$group}{$node->get_name})) {
push @nodes, $node;
}
}
unless (scalar(@nodes) > 1) {
if (scalar(@nodes)) {
warn "$tree $group: Have only 1 node ("
. join(" ",sort(map {$_->get_name} (@nodes)))
. ")\n";
} else {
warn "$tree $group: No nodes!\n";
}
$problem = 1;
next;
}
#my $mrca = my_get_mrca($tree,@nodes);
my $mrca = $treebase_trees{$tree}->get_mrca(\@nodes);
unless (defined($mrca)) {
warn "$tree $group: No defined MRCA for "
. join(" ",sort(keys %{ $overall_group_species_seen{$tree}{$group} }))
. "\n";
$problem = 1;
next;
}
my(@descendants) = grep {defined($_->get_name) &&
length($_->get_name)}
(@{ $mrca->get_descendants });
if (scalar(@descendants) > scalar(@nodes)) {
foreach $species (keys %{ $overall_group_species_seen{$tree}{$group} }) {
$species_seen_outside_group{$group}{$species} = 1;
}
}
}
}
warn "Have " . scalar(keys %species_seen_outside_group)
. " normal groups with species seen outside group\n";
--
Allen Smith http://cesario.rutgers.edu/easmith/
February 1, 2003 Space Shuttle Columbia
Ad Astra Per Aspera To The Stars Through Asperity