Subject: | Archive-Zip-1.27_01 consumes too much memory when iterating over archive members |
Date: | Mon, 8 Jun 2009 14:24:41 -0500 |
To: | bug-Archive-Zip [...] rt.cpan.org |
From: | Jeff Holt <jeff.holt [...] method-r.com> |
Message body is not shown because sender requested not to inline it.
I have a zip file with ~275,000 members. The following code, when executed
with ActivePerl 5.10.0 build 1004, will consume ~320mb of memory on MSWin32
and ~290mb of memory on Linux x86. I checked version 1.26 and it behaves
similarly to 1.27_01 w.r.t. memory consumption.
use Archive::Zip;
my $zip = Archive::Zip->new();
$zip->read($ARGV[0]);
print "My process id is $$. Press [Enter] to continue]\n";
my $junk = <STDIN>;
The reason it consumes so much memory is that the read method pushes member
data onto an array that can later be returned by the members method.
Attached is a patch to 1.27_01 which will allow for the new feature of using
callbacks to process individual members. The assumption in the patch is that
if you are interatively processing members, then you have no need to call
the members method. I've thought carefully about this and I can't think of
any reason why the assumption is faulty.
# This will create a zip file (~25mb) suitable for testing:
use Archive::Zip;
my $wz = Archive::Zip->new();
$wz->addString('', "$_.txt") for (1..275_000);
$wz->writeToFileNamed('test.zip');
And here's code that could use the patch to iteratively process members
using much less memory consumption (~8mb) and as such can often be faster,
sometimes significantly faster depending upon available core memory or
competition for memory.
use Archive::Zip;
sub mbrDelegate {
my $mbr = shift;
}
my $zh = Archive::Zip->new();
$zh->{readHeaderCallback} = \&mbrDelegate;
$zh->{noPushHeaders} = 1;
$zh->read("test.zip");
print "My process id is $$. Press [Enter] to continue]\n";
my $junk = <STDIN>;