CC: | "Givan, Scott A." <givans [...] missouri.edu>, "Spollen, William G." <spollenw [...] missouri.edu> |
Subject: | Failure to read second "original" gzipped file inside a "concatenated" gzipped file |
Date: | Thu, 8 Dec 2016 18:08:21 +0000 |
To: | "bug-IO-Compress [...] rt.cpan.org" <bug-IO-Compress [...] rt.cpan.org> |
From: | "Bottoms, Christopher A" <BottomsC [...] missouri.edu> |
(This is mostly copied from my post on Stackoverflow<http://stackoverflow.com/questions/41045834>).
In bash, you can concatenate gzipped files and the result is a valid gzipped file. As far as I recall, I have always been able to treat these "concatenated" gzipped files as normal gzipped files:
echo 'Hello world!' > hello.txt
echo 'Howdy world!' > howdy.txt
gzip hello.txt
gzip howdy.txt
cat hello.txt.gz howdy.txt.gz > greetings.txt.gz
gunzip greetings.txt.gz
cat greetings.txt
Which outputs
Hello world!
Howdy world!
However, when trying to read this same file using Perl's core IO::Uncompress::Gunzip module<https://metacpan.org/pod/IO::Uncompress::Gunzip>, it doesn't get past the first original file. Here is the result:
./my_zcat greetings.txt.gz
Hello world!
Here is the code for my_zcat:
#!/bin/env perl
use strict;
use warnings;
use v5.10;
use IO::Uncompress::Gunzip qw($GunzipError);
my $file_name = shift;
my $fh = IO::Uncompress::Gunzip->new($file_name) or die $GunzipError;
while (defined(my $line = readline $fh))
{
print $line;
}
If I totally decompress the files before creating a new gzipped file, I don't have this problem:
zcat hello.txt.gz howdy.txt.gz | gzip > greetings_via_zcat.txt.gz
./my_zcat greetings_via_zcat.txt.gz
Hello world!
Howdy world!
So, what is the difference between greetings.txt.gz and greetings_via_zcat.txt.gz and why might IO::Uncompress::Gunzip work correctly with greetings.txt.gz?
I'm guessing that IO::Uncompress::Gunzip messes up because of the metadata between the files. But, since greetings.txt.gz is a valid Gzip file, I would expect IO::Uncompress::Gunzip to work.
My workaround for now will be piping from zcat (which of course doesn't help Windows users much):
#!/bin/env perl
use strict;
use warnings;
use v5.10;
my $file_name = shift;
open(my $fh, '-|', "zcat $file_name");
while (defined(my $line = readline $fh))
{
print $line;
}