Subject: | Possible Memory Leak in TreeBuilder mode |
Date: | Wed, 28 Jan 2009 13:34:49 -0800 |
To: | bug-HTML-TableExtract [...] rt.cpan.org |
From: | Greg Michalec <greg [...] primate.net> |
Hi -
I'm running into a problem with using TableExtract to parse a directory
of HTML files. The memory footprint of the process continues to grow and
grow. This does not occur when I use TableExtract in its default
HTML::Parser mode, so I'm guessing their is a problem with the way
TableExtract is destroying its HTML::TreeBuilder object. According to
the HTML::TreeBuilder documentation, it's objects must be explicitly
deleted, due to the nature of HTML::Element tree objects.
Here's a test script that exhibits the problem:
<code>
#!/usr/bin/perl
use HTML::TableExtract qw(tree);
my $table = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</table>";
my $html = "<html><body>" . $table x 3 . "</body></html>";
foreach ( my $x = 0; $x <= 20; $x++) {
my $p = HTML::TableExtract->new();
$p->parse($html);
$p->eof;
$p->delete;
if (-f "/proc/$$/statm") {
my $mem = `cat /proc/$$/statm`;
$mem =~ s/^(\d+).*/$1/s;
print "$x: $mem\n";
}
}
</code>
Here's my system info:
Ubuntu 8.10 (2.6.27-9-generic x86_64)
perl v5.10.0 built for x86_64-linux-gnu-thread-multi
HTML::TableExtract 2.10-3
HTML::TreeBuilder 3.23
(all perl modules are from current ubuntu 8.10 packages)
Thanks!