Skip Menu |

This queue is for tickets about the Forks-Super CPAN distribution.

Report information
The Basics
Id: 78285
Status: resolved
Priority: 0/
Queue: Forks-Super

People
Owner: Nobody in particular
Requestors: tangent [...] west.etr-usa.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.63
Fixed in: 0.85



Subject: Various overload failures when Forks::Super is overloaded
We have seen at least three different symptoms when trying to use Forks::Super to run several (4-ish) parallel scp commands to transfer ~90,000 files from one machine to another. The core of our script's loop looks like this: $pid = fork { 'exec' => [ 'scp', $source, $destination ], callback => { # react to events here }, on_busy => 'queue', child_fh => 'join' }; The outer loop just gathers a list of files and calls the function containing this Forks::Super::fork() call. (Yes, we have basically reinvented rsync, with several improvements.) We've seen at least two different errors from this script when using that particular formulation. The first is an occasional "double free" error, which unfortunately we didn't save, and haven't been able to reproduce. The second is a complaint from Perl itself that it can't find Scalar::Util, even though it is installed: Can't locate Scalar/Util.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/lib/perl5/5.8.8/overload.pm line 88. We've verified the module's correct installation with perl -e "use Scalar::Util". It returns 0. In private email to MOB on this topic, he advised me to try changing the on_busy => 'queue' parameter to 'block' since our script doesn't have any other work to do while it is busy managing queued scp commands. That allowed the script to run for several hundred commands before it started yelling: Too many open files while opening < /home/etr/.fhfork26833/.fh_2083. [openfh=1018/1020] at /usr/lib/perl5/site_perl/5.8.8/Forks/Super/Job.pm line 1488. I didn't check exactly how many files it had copied, but I wouldn't be surprised if it were around 1020, that being just under the default 1024 per-user file handle limit on Linux. I suspect there is a file handle not being close()'d explicitly in 'block' mode, and someone is keeping a reference to the file so the Perl GC isn't closing it automatically.
Subject: Various failures when Forks::Super is overloaded
From: tangent [...] west.etr-usa.com
On Mon Jul 09 22:29:58 2012, etrwest wrote: Show quoted text
> The second is a complaint from Perl itself that it can't find > Scalar::Util, even though it is installed:
[snip] Show quoted text
> I didn't check exactly how many files it had copied, but I wouldn't be > surprised if it were around 1020, that being just under the default 1024 > per-user file handle limit on Linux.
In a bit of luck, I just saw the Scalar::Util error again, and checked the file count: it had copied another ~1024 files. So, I suspect this is happening for the same reason the third error I reported above is happening. Perl can't load the referenced module because the kernel won't let it have another file handle to do so. This explains why we get this error despite the fact that the module is in fact in the @INC path. So, we really only have two problems. Here's hoping we can catch the double-free again. :)
Hi Warren, That was a good catch about the Scalar::Util problem. When you pass the child_fh => 'join' argument to fork, you are opening a filehandle that doesn't get closed unless you call $job->close_fh or $job->dispose on the finished job. You could put that in a callback like fork { cmd => [ ... ], child_fh => 'join', callback => { finish => { $_[0]->close_fh }, ... } } but there probably ought to be an option or a convenience method to clean up long dead processes and recover those filehandles.
In 0.65, included a few more admonitions to manage your open filehandle resources wisely. Took the symptoms of this bug report and explained how they were related to having too many open filehandles. Ultimately, this module should monitor when open filehandles are getting scarce, and be able to (if configured for it) close off and recycle filehandles from old, finished jobs.
Made another half-assed attempt to address this issue in v0.74, with the $Forks::Super::ON_TOO_MANY_OPEN_FILEHANDLES variable. The default value is 'fail' and produces the old behavior. If you anticipate that your script will use a lot of file handles, you should set this variable to 'rescue', either at run-time with an assignment like $Forks::Super::ON_TOO_MANY_OPEN_FILEHANDLES = 'rescue'; or at import time like use Forks::Super ON_TOO_MANY_OPEN_FILEHANDLES=>'rescue'; With the 'rescue' setting as you run out of avilable filehandles, the module will take a guess at which ones you don't need anymore (say, from long completed jobs) and free some up.
I'm somewhat satisfied with the way the $Forks::Super::ON_TOO_MANY_OPEN_FILEHANDLES = 'rescue' setting works, though if you have the presence of mind to set $Forks::Super::ON_TOO_MANY_OPEN_FILEHANDLES than you also should have the presence of mind to close your children's open file handles. There's still a test issue with this feature on MSWin32 (which I expect to resolve in 0.86), but I'll mark this as fixed.