Skip Menu |

This queue is for tickets about the PerlIO-gzip CPAN distribution.

Report information
The Basics
Id: 122722
Status: open
Priority: 0/
Queue: PerlIO-gzip

People
Owner: Nobody in particular
Requestors: eclipsechasers2 [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Windows Bug When Using PerlIO::gzip To Read "Large" Text File
Date: Sun, 6 Aug 2017 06:55:04 +0000 (UTC)
To: "bug-PerlIO-gzip [...] rt.cpan.org" <bug-PerlIO-gzip [...] rt.cpan.org>
From: Owen Leibman <eclipsechasers2 [...] yahoo.com>
In Windows, the gzip layer appears to break down when reading "large" gzipped text files, where large is not particularly large (in particular nowhere near 2GB). The problem being reported does not occur in Unix, including Cygwin. My system is Windows 7 Professional 64-bit. My Perl distribution is Strawberry Perl, Perl version 5.20.2. Gzip from GnuWin32 is in my Windows path. PerlIO::gzip is version 0.20. Here is a program that creates a text file which demonstrates the problem: use strict; use warnings; use Carp; use English qw(-no_match_vars); use Readonly; Readonly::Scalar my $ROWS => 1_000; Readonly::Scalar my $COLS => 50; Readonly::Scalar my $RANDMAX => 1_500_000_000; Readonly::Scalar my $RANDSUB => 500_000_000; Readonly::Scalar my $OUTFILE => 'perlio.csv'; open(my $filex, q{>:raw}, $OUTFILE) or croak "$ERRNO"; foreach my $row (1 .. $ROWS) { foreach my $col (1 .. $COLS) { print {$filex} sprintf(q{%11d,}, rand($RANDMAX) - $RANDSUB) or croak "$ERRNO"; } printf {$filex} "\r\n" or croak "$ERRNO"; } close $filex or croak "$ERRNO"; print "Created $OUTFILE\n" or croak "$ERRNO"; I include "raw" and "\r" just to make sure this wasn't a line-ending problem. Once the file is created, I gzip it, and then run the following: use strict; use warnings; use Carp; use English qw(-no_match_vars); use Readonly; use PerlIO::gzip; Readonly::Scalar my $INFILE => 'perlio.csv.gz'; sub processfile { my ($infile) = @_; my $recsread = 0; open(my $filex, q{<:gzip}, $infile) or croak "$ERRNO"; while (my $rec = <$filex>) { $recsread += 1; print "rec $recsread size=", length($rec), "\n" or croak "$ERRNO"; } print 'Records read ', $recsread, "\n" or croak "$ERRNO"; close $filex or croak "$ERRNO"; return; } processfile($INFILE); In Cygwin and Linux, this correctly shows 1,000 lines each 602 bytes long. In Windows, this shows 808 lines of various lengths (and fails on the close). The first 79 lines show the expected length; line 80 shows 1216, line 81 shows 3479, and the rest show values both higher and lower than expected.
Hi, This is not a problem with PerlIO::gzip but the way you open the file for reading. On Windows, you also need to open the file with ':raw' or apply binmode() to the filehandle. Then you can also read gzipped files directly: open(my $filex, q{<:raw:gzip}, $infile) or croak "$ERRNO"; Otherwise, Perl on Windows will still use the :crlf layer that messes things up. -max Am So 06. Aug 2017, 02:58:25, eclipsechasers2@yahoo.com schrieb: Show quoted text
> In Windows, the gzip layer appears to break down > when reading "large" gzipped text files, > where large is not particularly large (in particular nowhere near > 2GB). > The problem being reported does not occur in Unix, including Cygwin. > > My system is Windows 7 Professional 64-bit. > My Perl distribution is Strawberry Perl, Perl version 5.20.2. > Gzip from GnuWin32 is in my Windows path. > PerlIO::gzip is version 0.20. > > Here is a program that creates a text file which demonstrates the > problem: > > use strict; > use warnings; > use Carp; > use English qw(-no_match_vars); > use Readonly; > Readonly::Scalar my $ROWS => 1_000; > Readonly::Scalar my $COLS => 50; > Readonly::Scalar my $RANDMAX => 1_500_000_000; > Readonly::Scalar my $RANDSUB => 500_000_000; > Readonly::Scalar my $OUTFILE => 'perlio.csv'; > open(my $filex, q{>:raw}, $OUTFILE) or croak "$ERRNO"; > foreach my $row (1 .. $ROWS) { > foreach my $col (1 .. $COLS) { > print {$filex} sprintf(q{%11d,}, rand($RANDMAX) - $RANDSUB) or > croak "$ERRNO"; > } > printf {$filex} "\r\n" or croak "$ERRNO"; > } > close $filex or croak "$ERRNO"; > print "Created $OUTFILE\n" or croak "$ERRNO"; > > I include "raw" and "\r" just to make sure this wasn't a line-ending > problem. > > Once the file is created, I gzip it, and then run the following: > > use strict; > use warnings; > use Carp; > use English qw(-no_match_vars); > use Readonly; > use PerlIO::gzip; > Readonly::Scalar my $INFILE => 'perlio.csv.gz'; > sub processfile { > my ($infile) = @_; > my $recsread = 0; > open(my $filex, q{<:gzip}, $infile) or croak "$ERRNO"; > while (my $rec = <$filex>) { > $recsread += 1; > print "rec $recsread size=", length($rec), "\n" or croak > "$ERRNO"; > } > print 'Records read ', $recsread, "\n" or croak "$ERRNO"; > close $filex or croak "$ERRNO"; > return; > } > processfile($INFILE); > > In Cygwin and Linux, this correctly shows 1,000 lines each 602 bytes > long. > In Windows, this shows 808 lines of various lengths (and fails on the > close). > The first 79 lines show the expected length; line 80 shows 1216, > line 81 shows 3479, and the rest show values both higher and lower > than expected.