Hi Paul. Thanks for the analysis of the problem.
I think I may be able to help you create files that will let you
experiment with the problem more if you want.
The linux.bin.gz file is the uClinux kernel for a Beyonwiz PVR, and it
has the PVR's root file system embedded in it as a ROMFS. The "real"
code that I'm running extracts and unpacks the ROMFS (and also lets me
pack a new ROMFS into the space where the old one sat). The root ROMFS
contains a 10MB file, /bank0, containing all null bytes. My
understanding is that it's mmapped to provide an allocation arena for
malloc. I think that it's the unpacking of this large contiguous chunk
of null bytes that's forcing the large buffer size in Gunzip. I don't
know why the malloc of that in perl is choking.
I suspect that it may be possible to create a file that contains a large
null-ed out block to trigger the same problem. Unfortunately, a file
made just by compressing a single 10MB chunk of zeros unpacks just fine:
dd if=/dev/zero bs=1k count=10240 | gzip -c > null.gz
Unfortunately, your workaround won't do the job I need. I want the code
to be able to run on a Windows machine that doesn't have g[un]zip.
Fortunately, the workaround in the attached script does work. However,
there doesn't seem to be any way to pass the read() buffer length
through to Gunzip::gunzip(). The workaround also works in my "real"
code.
Thanks for your help.
Peter
Dr Peter Lamb
Project Leader
Information Engineering Laboratory
CSIRO ICT Centre
Innovative ICT transforming Australian industries
Post: PO Box 664, Canberra, ACT 2601, Australia
Office: Computer Science & Information Technology Building (Building
108)
Australian National University, Acton, ACT 2601
T: +61 2 6216 7047, F: +61 2 6216 7111
www.ict.csiro.au
Show quoted text> -----Original Message-----
> From: Paul Marquess via RT [mailto:bug-IO-Compress-Zlib@rt.cpan.org]
> Sent: Thursday, 24 July 2008 18:54
> To: Lamb, Peter (ICT Centre, Acton)
> Subject: [rt.cpan.org #37833] "Out of memory error" unzipping file in
> IO::Uncompress::Gunzip
>
> <URL:
http://rt.cpan.org/Ticket/Display.html?id=37833 >
>
> Hi Peter
>
> > I took the liberty of moving your print statement into the read loop
in
Show quoted text> > the code you posted, because otherwise all you see is the "Out of
> > memory" message as with my original post; I also set autoflush on
> > STDOUT. The script as I ran it and the output are attached. I had
tried
Show quoted text> > doing much the same myself to see if I could get a handle on the
> > problem, but I got nowhere. The uncompressed file is about 20MB; the
> > ungzip dies about 2.5MB into the uncompressed stream.
> >
> > ./gunzipbug1.pl linux.bin.gz > gunzipbug1.out 2>&1
> >
> > If you want to try the problem yourself, you can run Cygwin on your
> > Windows box (free download from
http://www.cygwin.com/); it's a Unix
> > compatibility shell for Windows, and I'm running it under Win XP
2002
Show quoted text> > SP2.
>
> I've used cygwin before, and if I had enough space on my work laptop
I'd
Show quoted text> install it. :-)
>
> > I don't know why you got two copies of the bug report; I only have a
> > record off a single email message to
bug-IO-Compress-Zlib@rt.cpan.org,
Show quoted text> > but I did get two automated responses with the two ticket numbers
37833
Show quoted text> > and 37834. Sorry for any inconvenience.
>
> No problem.
>
> > Anyway, if there are more tests you'd like me to run, please let me
> > know.
>
> The output you've sent me has given me enough info to spot the
problem.
Show quoted text>
> ...
> Got 2554049 buyes
> Out of memory during "large" request for 536875008 bytes...
>
> If I look at the output from running a slight variant on the same
script
Show quoted text> (I output the number of bytes uncompressed in each call to read) on a
> Linux box this is what I see
>
> ...
> Got 52941 bytes -> total 2554049 bytes
> Got 10499407 bytes -> total 13053456 bytes
>
> Notice the size of the uncompressed data in the call after the last
> successful call you got. It looks like cygwin is failing when it is
> trying to create a 10 meg output buffer. Not sure why cygwin reports a
> request for 500meg though.
>
> So here is where I think the problem lies - at the moment my code
reads
Show quoted text> the compressed data in 4k chunks. It will carry out uncompression on
> that input buffer until it is exhausted, regardless of how much
> uncompressed data that will generate - the output buffer will be grown
> if needed. So basically the output buffer size is unbounded.
>
> The obvious fix for this is for me to make the output buffer size
> bounded. Unfortunately that will take a bit of work on my part.
>
> In the interim if you just want to uncompress a file you can use this
>
> system "gunzip -c $inputfile >$outputfile";
>
> Or if you need to process the contents, this
>
> open F, "gunzip -c $inputfile|";
> while (<F>)
> {
> # do something
> }
>
> Are those workarounds good enough for your purposes?
>
> Paul
>