Subject: | Parallel::ForkManager loops in wait_all_children |
Date: | Tue, 26 Aug 2008 11:03:38 +0200 |
To: | bug-Parallel-ForkManager [...] rt.cpan.org |
From: | Frederik Ramm <frederik [...] remote.org> |
Hi,
I have a problem under Linux where a rather complex script I did
sometimes hangs (in a tight loop) when it runs wait_all_children. I
cannot reproduce it with a test script; it only happens in production
and only sometimes! I'm not doing anything strange, just instantiating a
ForkManager, then every now and then doing a "start" and "finish". No
callbacks, nothing.
strace()ing a hanging process reveals that it continously calls "wait4"
which returns an ECHILD error (no children to wait for).
In inspected the source and I believe it must somehow have missed a
SIGCHLD so that it thinks there are still child processes while in fact
there aren't.
I will now try and fix it by changing wait_all_children thus:
sub wait_all_children { my ($s)=@_;
while (keys %{ $s->{processes} }) {
$s->on_wait;
$s->wait_one_child(defined $s->{on_wait_period} ? &WNOHANG : undef);
if ($! == ECHILD) {
delete $s->{processes};
last;
}
};
}
of course this is a very brutal way to do it - would be better to not
miss the SIGCHLD in the first place, but at least I hope my program can
continue this way.
Bye
Frederik