Skip Menu |

This queue is for tickets about the HTML-TableExtract CPAN distribution.

Report information
The Basics
Id: 42869
Status: resolved
Priority: 0/
Queue: HTML-TableExtract

People
Owner: MSISK [...] cpan.org
Requestors: greg [...] primate.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Possible Memory Leak in TreeBuilder mode
Date: Wed, 28 Jan 2009 13:34:49 -0800
To: bug-HTML-TableExtract [...] rt.cpan.org
From: Greg Michalec <greg [...] primate.net>
Hi - I'm running into a problem with using TableExtract to parse a directory of HTML files. The memory footprint of the process continues to grow and grow. This does not occur when I use TableExtract in its default HTML::Parser mode, so I'm guessing their is a problem with the way TableExtract is destroying its HTML::TreeBuilder object. According to the HTML::TreeBuilder documentation, it's objects must be explicitly deleted, due to the nature of HTML::Element tree objects. Here's a test script that exhibits the problem: <code> #!/usr/bin/perl use HTML::TableExtract qw(tree); my $table = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</table>"; my $html = "<html><body>" . $table x 3 . "</body></html>"; foreach ( my $x = 0; $x <= 20; $x++) { my $p = HTML::TableExtract->new(); $p->parse($html); $p->eof; $p->delete; if (-f "/proc/$$/statm") { my $mem = `cat /proc/$$/statm`; $mem =~ s/^(\d+).*/$1/s; print "$x: $mem\n"; } } </code> Here's my system info: Ubuntu 8.10 (2.6.27-9-generic x86_64) perl v5.10.0 built for x86_64-linux-gnu-thread-multi HTML::TableExtract 2.10-3 HTML::TreeBuilder 3.23 (all perl modules are from current ubuntu 8.10 packages) Thanks!
Fixed in 2.11; parsing status is tracked via the eof() method which is called intrinsically with parse_file(). When eof() is called, _reset_state() is invoked automatically.