Subject: | Various overload failures when Forks::Super is overloaded |
We have seen at least three different symptoms when trying to use
Forks::Super to run several (4-ish) parallel scp commands to transfer
~90,000 files from one machine to another.
The core of our script's loop looks like this:
$pid = fork {
'exec' => [ 'scp', $source, $destination ],
callback => {
# react to events here
},
on_busy => 'queue',
child_fh => 'join'
};
The outer loop just gathers a list of files and calls the function
containing this Forks::Super::fork() call. (Yes, we have basically
reinvented rsync, with several improvements.)
We've seen at least two different errors from this script when using
that particular formulation.
The first is an occasional "double free" error, which unfortunately we
didn't save, and haven't been able to reproduce.
The second is a complaint from Perl itself that it can't find
Scalar::Util, even though it is installed:
Can't locate Scalar/Util.pm in @INC (@INC contains:
/usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl
/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .)
at /usr/lib/perl5/5.8.8/overload.pm line 88.
We've verified the module's correct installation with perl -e "use
Scalar::Util". It returns 0.
In private email to MOB on this topic, he advised me to try changing the
on_busy => 'queue' parameter to 'block' since our script doesn't have
any other work to do while it is busy managing queued scp commands. That
allowed the script to run for several hundred commands before it started
yelling:
Too many open files while opening < /home/etr/.fhfork26833/.fh_2083.
[openfh=1018/1020] at /usr/lib/perl5/site_perl/5.8.8/Forks/Super/Job.pm
line 1488.
I didn't check exactly how many files it had copied, but I wouldn't be
surprised if it were around 1020, that being just under the default 1024
per-user file handle limit on Linux. I suspect there is a file handle
not being close()'d explicitly in 'block' mode, and someone is keeping a
reference to the file so the Perl GC isn't closing it automatically.