Skip Menu |

This queue is for tickets about the IO-Compress-Zlib CPAN distribution.

Report information
The Basics
Id: 38784
Status: resolved
Priority: 0/
Queue: IO-Compress-Zlib

People
Owner: Nobody in particular
Requestors: jesper [...] krogh.cc
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Performance of IO::Uncompress::Gunzip .. really bad.
#!/usr/bin/perl use strict; use warnings; use FindBin; use lib $FindBin::Bin . "/../../lib/"; use IO::Uncompress::Gunzip; use Tools::Timer; Tools::Timer->timer("start gunzip"); for(my $i= 0;$i < 10; $i++){ open FH,"zcat $ARGV[0] |"; while(my $line = <FH>){ my $var = $line; } close(FH); } Tools::Timer->timer("stop gunzip"); Tools::Timer->timer("start IO::Uncompress"); for(my $i= 0;$i < 10; $i++){ my $fh = new IO::Uncompress::Gunzip($ARGV[0]); while(my $line = <$fh>){ my $var = $line; } close($fh); } Tools::Timer->timer("stop IO::Uncompress"); Tools::Timer->print_result; Produces. imer: start gunzip - stop gunzip 7.708832s ( 5.2%) timer: stop gunzip - start IO::Uncompress 0.000049s ( 0.0%) timer: start IO::Uncompress - stop IO::Uncompress 141.507715s (94.8%) timer: Total timed 149.216596s Sum.. zcat in a pipe it 20 times faster than IO::Uncompress .. Jesper
Hi Jesper, thanks for the report. I don't have Tools::Timer (is it available on the net somewhere?), so I added my own code to measure elapsed time in your script. The first test I did was with a 2meg text file. I see IO::Uncompress::Gunzip about 10 times slower than zcat. Depending on the file, I see IO::Uncompress::Gunzip between a 5 and 10 times slower. Now if I changed your script slightly so that I read fixed size blocks from the compressed files, rather than a line at a time I get a completely different result. So instead of this idiom to read from the file while(my $line = <FH>){ my $var = $line; } I use this my $line; while(read(FH, $line, 4096) >0) { my $var = $line; } I get both zcat & IO::Uncompress::Gunzip coming out roughly the same. With some files zcat is marginally faster, with others IO::Uncompress::Gunzip is slightly better. So that means that the raw uncompression is about the same for both. Where they differ, and the reason for the big difference you see, is how the readline functionality is implemented. For the zcat case, this is handled internally by perl in C code. For IO::Uncompress::Gunzip the readline functionality is written completely in Perl. Perl code is never going to win that contest. Paul
From: jesper [...] krogh.cc
Hi Paul. Thanks a lot for the prompt response. Tools::Timer is just a wrapper around Time::HiRes so no magic there. I wasn't aware that doing stuff line-based seems to be that much more expensive in perl. Thanks. Jesper