Skip Menu |

This queue is for tickets about the File-Slurp CPAN distribution.

Report information
The Basics
Id: 127329
Status: resolved
Priority: 0/
Queue: File-Slurp

People
Owner: cwhitener [...] gmail.com
Requestors: dan.bolser [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: File::Slurp seems to choke on files > 4Gb
From: dan.bolser [...] gmail.com
Sorry, accidentally created without content, my bad. So I'm using this version of File::Slurp: $perl -MFile::Slurp -e 'print $File::Slurp::VERSION ."\n";' 9999.21 And I'm creating a 'large' file like this: time dd if=/dev/zero of=4Gb.txt count=4096 bs=1048576 and calling File::Slurp like this: time perl -MFile::Slurp -e 'my $text = read_file( "4Gb.txt" ) ;' I'm seeing this error (after about 5mins): Offset outside string at /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5/File/Slurp.pm line 234. This works fine on files about 1 and 2Gb in size.
I created a file just shy of 4Gb like this: time dd if=/dev/zero of=4Gb-shy.txt count=4095 bs=1048576 and observed no error when I ran this: time perl -MFile::Slurp -e 'my $text = read_file( "4Gb-shy.txt" ) ;' Forgot to mention, This is perl 5, version 14, subversion 4 (v5.14.4) built for x86_64-linux-thread-multi-ld (with 1 registered patch). Linux xxx 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 GNU/Linux
Pasting perl -V... Summary of my perl5 (revision 5 version 14 subversion 4) configuration: Platform: osname=linux, osvers=3.10.0-327.18.2.el7.x86_64, archname=x86_64-linux-thread-multi-ld uname='linux ebi-cli-002.ebi.ac.uk 3.10.0-327.18.2.el7.x86_64 #1 smp fri apr 8 05:09:53 edt 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-Dprefix=/nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4 -de -Dusedevel -Dusethreads -Duseshrplib -Duselargefiles -Duse64bitint -Duse64bitall -Duselongdouble -Dusemultiplicity -Accflags=-fPIC -UDEBUGGING -A'eval:scriptdir=/nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/bin'' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fPIC -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -fPIC -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='5.3.0', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8 alignbytes=16, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/../lib64 /usr/lib/../lib64 /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=libc-2.17.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/lib/perl5/5.14.4/x86_64-linux-thread-multi-ld/CORE' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LONG_DOUBLE USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Locally applied patches: Devel::PatchPerl 1.48 Built under linux Compiled at Jul 25 2017 10:14:00 %ENV: PERL5LIB="/hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5" PERL_LOCAL_LIB_ROOT="/hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib" PERL_MB_OPT="--install_base "/hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib"" PERL_MM_OPT="INSTALL_BASE=/hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib" @INC: /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5/5.14.4/x86_64-linux-thread-multi-ld /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5/5.14.4 /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5/x86_64-linux-thread-multi-ld /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5 /nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/lib/perl5/site_perl/5.14.4/x86_64-linux-thread-multi-ld /nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/lib/perl5/site_perl/5.14.4 /nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/lib/perl5/5.14.4/x86_64-linux-thread-multi-ld /nfs/software/ensembl/RHEL7-JUL2017-core2/plenv/versions/5.14.4/lib/perl5/5.14.4 .
Subject: Re: [rt.cpan.org #127329] File::Slurp seems to choke on files > 4Gb
Date: Wed, 10 Oct 2018 10:48:51 -0400
To: bug-File-Slurp [...] rt.cpan.org
From: Uri Guttman <uri [...] stemsystems.com>
On 10/10/2018 05:33 AM, Dan Bolser via RT wrote: Show quoted text
> Wed Oct 10 05:33:43 2018: Request 127329 was acted upon. > Transaction: Ticket created by dan.bolser > Queue: File-Slurp > Subject: File::Slurp seems to choke on files > 4Gb > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: dan.bolser@gmail.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=127329 > > > > This transaction appears to have no content
regardless, slurping 4GB files is stupid. uri
Subject: Re: [rt.cpan.org #127329] File::Slurp seems to choke on files > 4Gb
Date: Wed, 10 Oct 2018 10:53:12 -0400
To: bug-File-Slurp [...] rt.cpan.org
From: Uri Guttman <uri [...] stemsystems.com>
On 10/10/2018 05:37 AM, Dan Bolser via RT wrote: Show quoted text
> Queue: File-Slurp > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=127329 > > > Sorry, accidentally created without content, my bad. > > So I'm using this version of File::Slurp: > > $perl -MFile::Slurp -e 'print $File::Slurp::VERSION ."\n";' > 9999.21 > > > And I'm creating a 'large' file like this: > > time dd if=/dev/zero of=4Gb.txt count=4096 bs=1048576 > > > and calling File::Slurp like this: > > time perl -MFile::Slurp -e 'my $text = read_file( "4Gb.txt" ) ;' > > > I'm seeing this error (after about 5mins): > > Offset outside string at /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local-lib/lib/perl5/File/Slurp.pm line 234. > > > This works fine on files about 1 and 2Gb in size. >
why do you want to slurp files that large? the fact that it takes 5 minutes means you are likely thrashing the disk and ram. the 1-2 GB files work because you have enough free ram to slurp the whole file.  also are you using a 32 bit perl? that might be the cause of the error. the module will be accessing part of a string longer than allowed by perl (31 bits i think which is 2GB). uri
On Wed Oct 10 10:53:23 2018, uri@stemsystems.com wrote: Show quoted text
> On 10/10/2018 05:37 AM, Dan Bolser via RT wrote:
> > Queue: File-Slurp > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=127329 > > > > > Sorry, accidentally created without content, my bad. > > > > So I'm using this version of File::Slurp: > > > > $perl -MFile::Slurp -e 'print $File::Slurp::VERSION ."\n";' > > 9999.21 > > > > > > And I'm creating a 'large' file like this: > > > > time dd if=/dev/zero of=4Gb.txt count=4096 bs=1048576 > > > > > > and calling File::Slurp like this: > > > > time perl -MFile::Slurp -e 'my $text = read_file( "4Gb.txt" ) ;' > > > > > > I'm seeing this error (after about 5mins): > > > > Offset outside string at > > /hps/cstor01/nobackup/crop_genomics/dbolser/GenomeLoader/GCA_002575655.1/local- > > lib/lib/perl5/File/Slurp.pm line 234. > > > > > > This works fine on files about 1 and 2Gb in size. > >
> > why do you want to slurp files that large?
Oh, no reason. Show quoted text
> the fact that it takes 5 > minutes means you are likely thrashing the disk and ram.
Yeah. Show quoted text
> the 1-2 GB > files work because you have enough free ram to slurp the whole file.
I don't think so. It fails reliably when it's even fractionally above 4Gb and works reliably when it's even fractionally below 4Gb. This machine has about 90GiB free (of 250 GiB total). Show quoted text
> also are you using a 32 bit perl? that might be the cause of the > error.
I think the perl -V shows that it's 64bit. As you mention 32 bit had a 2 GiB limit. Show quoted text
> the module will be accessing part of a string longer than allowed by > perl (31 bits i think which is 2GB). > > uri
Seems File::Slurper does the job.
On Wed Oct 10 10:49:12 2018, uri@stemsystems.com wrote: Show quoted text
> On 10/10/2018 05:33 AM, Dan Bolser via RT wrote:
> > Wed Oct 10 05:33:43 2018: Request 127329 was acted upon. > > Transaction: Ticket created by dan.bolser > > Queue: File-Slurp > > Subject: File::Slurp seems to choke on files > 4Gb > > Broken in: (no value) > > Severity: (no value) > > Owner: Nobody > > Requestors: dan.bolser@gmail.com > > Status: new > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=127329 > > > > > > > This transaction appears to have no content
> regardless, slurping 4GB files is stupid.
I agree. But it's not fundamentally more stupid than slurping files that are 3.999999 GiB. i.e. this is a bug, even if what I'm doing is stupid.
Subject: Re: [rt.cpan.org #127329] File::Slurp seems to choke on files > 4Gb
Date: Wed, 10 Oct 2018 11:15:08 -0400
To: bug-File-Slurp [...] rt.cpan.org
From: Uri Guttman <uri [...] stemsystems.com>
On 10/10/2018 11:08 AM, Dan Bolser via RT wrote: Show quoted text
>
>> the fact that it takes 5 >> minutes means you are likely thrashing the disk and ram.
> Yeah. > >
>> the 1-2 GB >> files work because you have enough free ram to slurp the whole file.
> I don't think so. It fails reliably when it's even fractionally above 4Gb and works reliably when it's even fractionally below 4Gb. > > This machine has about 90GiB free (of 250 GiB total).
but we both agreed thrashing seems to be happening. very odd if you really have so much ram. Show quoted text
>> also are you using a 32 bit perl? that might be the cause of the >> error.
> I think the perl -V shows that it's 64bit. As you mention 32 bit had a 2 GiB limit.
i did see that in a later post. Show quoted text
>
>> the module will be accessing part of a string longer than allowed by >> perl (31 bits i think which is 2GB). >> >> uri
> > Seems File::Slurper does the job.
then there is likely a bug with an edge case on 4GB exactly. i can't test that as i don't have nearly that amount of free ram. the module loops on like 1MB chunks to slurp. you can set that chunk size with an option. try changing the blk_size option on read_file. try 1 byte less or more, much larger chunks (like 1GB), etc. it might provide clues as to why it fails at 4GB. uri
Hi Dan, I'm glad you were able to find a solution that works for you. I have a machine I can test on with plenty of RAM. I'll see if I can narrow down what's happening later on. Just to let you know, so hopefully you're not waiting on us, I don't expect to have a workable solution anytime soon. The main focus right now is on documentation cleanup and then Perl 5.30 compliance. When those tasks are complete, hopefully only a few short weeks, I will definitely circle back around to this. Thanks again, Chase
Hi! I believe this will be fixed for you now as the read_file function has been simplified quite a bit. I'll close this out for now in the assumption that 9999.26 should work. Thanks, Chase
Subject: Re: [rt.cpan.org #127329] File::Slurp seems to choke on files > 4Gb
Date: Wed, 13 Feb 2019 20:29:05 +0000
To: bug-File-Slurp [...] rt.cpan.org
From: Dan Bolser <dan.bolser [...] gmail.com>
Many thanks! On Wed, 13 Feb 2019 6:01 pm Chase Whitener via RT < bug-File-Slurp@rt.cpan.org wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=127329 > > > Hi! > > I believe this will be fixed for you now as the read_file function has > been simplified quite a bit. I'll close this out for now in the assumption > that 9999.26 should work. > > Thanks, > Chase >