Skip Menu |

This queue is for tickets about the Schedule-Cron CPAN distribution.

Report information
The Basics
Id: 56926
Status: open
Priority: 0/
Queue: Schedule-Cron

People
Owner: roland [...] cpan.org
Requestors: rene [...] margar.fr
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.00_1
Fixed in: (no value)



Subject: Program quits after 64 launches under Windows
Here you can find a fix to make Schedule::Cron work on a Windows platform using activeperl distribution (5.8.4 and above). On windows fork is emulated with threads, handler defined for $SIG {'CHLD'} is never called. Perl can't handle more than 64 "zombies" waiting for the parent to get the return code. When the 65th terminates, the main program quits with no error message. All we need is to acknowledge regularly for terminated sub processes using the waitpid function. Under windows REAPER function has to deal with negative pids to clean STARTEDCHILD hash. As a patch, I send you as an attachment a replacement code for functions REAPER and _cleanup_process_list. It makes this module works under both UNIX and Windows.
Subject: Cron_patch.txt
sub REAPER { if ($HAS_POSIX) { while (my $kid = waitpid(-1, WNOHANG)) { if (defined $STARTEDCHILD{$kid}) { $STARTEDCHILD{$kid} = 0; dbg "REAPER: Child $kid cleanned"; } } } else { my $waitedpid = 0; while($waitedpid != -1) { $waitedpid = wait; if (defined $STARTEDCHILD{$waitedpid}) { $STARTEDCHILD{$waitedpid} = 0; dbg "REAPER: Child $waitedpid cleanned"; } } } } sub _cleanup_process_list { REAPER() if ($HAS_POSIX); for my $k (keys %STARTEDCHILD) { delete $STARTEDCHILD{$k} unless $STARTEDCHILD{$k}; } }
Hi Rene, 
 
thanx for your patch. I 'optimized' it a bit (since there came in
some other patches, too in the meantime) and put it into
a 1.01_1 release (I publish it this evening). Since the patch
affects a very critical place (several tickets were open around
the signal handling), I would like to ask you to review my changes
and whether you see any problems (and whether I understand the issue
right). This would be of great help for me.
 
If everything works well, I will put out a 1.01 release rather soon.
 
I attachted my changes along with some comments but you can look into 
1.01_1 soon directly 

ciao ...
...roland

 


Chanes to sub REAPER()
Subject: reaper.pl
sub REAPER { local ($!,%!); my $kid; do { # Only on POSIX systems the wait will return immediately only # if there are no finished child processes. Simple 'wait' will # wait blocking on childs. $kid = $HAS_POSIX ? waitpid(-1, WNOHANG) : wait; if ($kid > 0 && defined $STARTEDCHILD{$kid}) { # We don't delete the hash entry here to avoid an issue # when modifying a global hash from multiple threads $STARTEDCHILD{$kid} = 0; } } while ($kid > 0); } # Cleaning is done in extra method called from the main # process in order to avoid event handlers modifying this # global hash which can lead to memory errors. # See RT #55741 for more details on this. # This method is called in strategic places. sub _cleanup_process_list { # Cleanup processes even on those systems, where the SIGCHLD is not # propagated. Only do this for POSIX, otherwise this call would block # until all child processes would have been finished. # See RT #56926 for more details. &REAPER() if $HAS_POSIX; # Delete entries from this global hash only from within the main # thread/process. Hence, this method must not be called from within # a signalhandler for my $k (keys %STARTEDCHILD) { delete $STARTEDCHILD{$k} unless $STARTEDCHILD{$k}; } }
From: rene [...] margar.fr
Thank you, I'm testing the new code. With my patch I had some troubles when calling REAPER in _cleanup_process_list when using the nofork option but with your new code it works. I don't know why yet. The REAPER code must handle negative PIDs as Win32 port of perl uses windows threads to emulate fork & waitpid functions. Such forked processes have negative PIDs. If checking for strict $kid > 0, the %STARTCHILD hash will never be cleared. I'm still working on this to give you my feedback.
From: rene [...] margar.fr
Hang was caused by an infinite loop in REAPER code when called by _cleanup_process_list under Windows as PID can be negative. I've changed the conditions in REAPER to enable negative PID except -1 which is an error code. Now it works fine in Windows & Unix even when running in nofork mode. See attached modified code. Slts, René
Subject: reaper.pl
sub REAPER { local ($!,%!); my $kid; do { # Only on POSIX systems the wait will return immediately only # if there are no finished child processes. Simple 'wait' will # wait blocking on childs. $kid = $HAS_POSIX ? waitpid(-1, WNOHANG) : wait; if ($kid != 0 && $kid != -1 && defined $STARTEDCHILD{$kid}) { # We don't delete the hash entry here to avoid an issue # when modifying a global hash from multiple threads $STARTEDCHILD{$kid} = 0; dbg "REAPER: Child $kid cleanned"; } } while ($kid != 0 && $kid != -1); } # Cleaning is done in extra method called from the main # process in order to avoid event handlers modifying this # global hash which can lead to memory errors. # See RT #55741 for more details on this. # This method is called in strategic places. sub _cleanup_process_list { # Cleanup processes even on those systems, where the SIGCHLD is not # propagated. Only do this for POSIX, otherwise this call would block # until all child processes would have been finished. # See RT #56926 for more details. &REAPER() if $HAS_POSIX; # Delete entries from this global hash only from within the main # thread/process. Hence, this method must not be called from within # a signalhandler for my $k (keys %STARTEDCHILD) { delete $STARTEDCHILD{$k} unless $STARTEDCHILD{$k}; } }
Thans for the confirmation. However, I had to rework the code again, since
waiting on any PID doesn't work if  cron jobs forks processes on their own. 

E.g. if a job uses 'system()' and forks aways, the reaper with 'watpid(-1)' will 
also wait on those processes which in turn causes the system() to return 
with an error ("No child processes") since it couldn't reap the process
on its own. 

So I reverted back to the algorithm, which only reaps known childs (waitpid($pid)), 
but still call the reaper from strategical points in the code (not only
from the SIGCHLD handler). 

I hope, this still fixes the issues with windows not calling SIGCHLD handler, but 
I can't verify this. Could you verify this issue ? (I know, it took me some
time to come back to this bug again ;-). I just released a version 1.01_2
with this new behaviour.

thanks ...
... roland

On Mon May 31 11:22:00 2010, rene@margar.fr wrote:
Show quoted text
> Hang was caused by an infinite loop in REAPER code when called by
> _cleanup_process_list under Windows as PID can be negative. I've changed
> the conditions in REAPER to enable negative PID except -1 which is an
> error code.
>
> Now it works fine in Windows & Unix even when running in nofork mode.
>
> See attached modified code.
>
> Slts,
> René

Hi Rene, 

i know this ticket is open for quite some time and Schedule::Cron was not under 
active development for more than a year now, but release 1.01 has been pushed
to CPAN last week which *should* fix this bug. However, I currently has not the 
proper environment for testing this so I'd like you to ask, if you could please
check whether 1.01 really fixes this bug so that I could close this ticket.

If you don't have time for this, no problem, but please leave me a short note so that
I can try to verify it on my own (when I have time ;-)

ciao ...
... roland

On Mon May 31 11:22:00 2010, rene@margar.fr wrote:
Show quoted text
> Hang was caused by an infinite loop in REAPER code when called by
> _cleanup_process_list under Windows as PID can be negative. I've changed
> the conditions in REAPER to enable negative PID except -1 which is an
> error code.
>
> Now it works fine in Windows & Unix even when running in nofork mode.
>
> See attached modified code.
>
> Slts,
> René

From: rene [...] margar.fr
Hi Roland, I was not available last weeks, I will take a look on your code update these days and I'll tell you what I see on my windows platform. Thank you, René Le Ven 10 Juin 2011 02:08:45, ROLAND a écrit : Show quoted text
> Hi Rene, > > i know this ticket is open for quite some time and Schedule::Cron was > not under > active development for more than a year now, but release 1.01 has been > pushed > to CPAN last week which *should* fix this bug. However, I currently > has not the > proper environment for testing this so I'd like you to ask, if you > could please > check whether 1.01 really fixes this bug so that I could close this > ticket. > > If you don't have time for this, no problem, but please leave me a > short note > so that > I can try to verify it on my own (when I have time ;-) > > ciao ... > ... roland > > On Mon May 31 11:22:00 2010, rene@margar.fr wrote:
> > Hang was caused by an infinite loop in REAPER code when called by > > _cleanup_process_list under Windows as PID can be negative. I've
> changed
> > the conditions in REAPER to enable negative PID except -1 which is
> an
> > error code. > > > > Now it works fine in Windows & Unix even when running in nofork
> mode.
> > > > See attached modified code. > > > > Slts, > > René
From: rene [...] margar.fr
I've been testing intensively Schedule::Cron 1.01_3 for several days. Here are my conclusions : Using REAPER with _reaper_specific or _reaper_all makes no difference for my application (under Unix or Windows). I added a local SIGCHLD routine to handle my own forks, this is usefull under Unix only since Windows never calls $SIG{CHLD}. I also tested Schedule::Cron with the nofork option (so no REAPER is used). It works fine too. I need the nofork option when the scheduled job needs to modify the current job list. Windows negative PID are now handled correctly and application can go beyond the 64 launches limit. Bug is resolved. nostatus option is a great idea as I needed to comment the $0 alteration for my application (my monitoring system does not like a process name to change over time). Thank you for your great job, René Le Mar 14 Juin 2011 05:30:57, rene@margar.fr a écrit : Show quoted text
> Hi Roland, > > I was not available last weeks, I will take a look on your code update > these days and I'll tell you what I see on my windows platform. > > Thank you, > René > > > Le Ven 10 Juin 2011 02:08:45, ROLAND a écrit :
> > Hi Rene, > > > > i know this ticket is open for quite some time and Schedule::Cron was > > not under > > active development for more than a year now, but release 1.01 has been > > pushed > > to CPAN last week which *should* fix this bug. However, I currently > > has not the > > proper environment for testing this so I'd like you to ask, if you > > could please > > check whether 1.01 really fixes this bug so that I could close this > > ticket. > > > > If you don't have time for this, no problem, but please leave me a > > short note > > so that > > I can try to verify it on my own (when I have time ;-) > > > > ciao ... > > ... roland > > > > On Mon May 31 11:22:00 2010, rene@margar.fr wrote:
> > > Hang was caused by an infinite loop in REAPER code when called by > > > _cleanup_process_list under Windows as PID can be negative. I've
> > changed
> > > the conditions in REAPER to enable negative PID except -1 which is
> > an
> > > error code. > > > > > > Now it works fine in Windows & Unix even when running in nofork
> > mode.
> > > > > > See attached modified code. > > > > > > Slts, > > > René
> >