Skip Menu |

This queue is for tickets about the Schedule-Cron CPAN distribution.

Report information
The Basics
Id: 55741
Status: resolved
Priority: 0/
Queue: Schedule-Cron

People
Owner: Nobody in particular
Requestors: develop [...] traveljury.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.99
Fixed in: (no value)



Subject: Working around a Perl bug
Hiya I've reported this before, but you closed it because it is a Perl bug. I'm hoping that you can add a workaround though, as it is causing my daemon to die regularly. The daemon dies with: panic: attempt to copy value 1 to a freed scalar 35befe0 at /opt/perl-5.8.9-unthreaded/lib/site_perl/5.8.9/Schedule/Cron.pm line 1064. That's the line where it tries to set $STARTEDCHILD{$pid} = 1. From what I've read, the problem is in the &REAPER, where it does: delete $STARTEDCHILD{$pid} A workaround is suggested in http://www.perlmonks.org/index.pl?node_id=813655 which is to set $STARTEDCHILD{$pid} = 0, instead of deleting it: diff -ruN Schedule-Cron-0.99_original/lib/Schedule/Cron.pm Schedule-Cron-0.99_patched/lib/Schedule/Cron.pm --- Schedule-Cron-0.99_original/lib/Schedule/Cron.pm 2009-09-12 09:19:15.000000000 +0200 +++ Schedule-Cron-0.99_patched/lib/Schedule/Cron.pm 2010-03-20 16:26:02.000000000 +0100 @@ -130,7 +130,7 @@ my $res = $HAS_POSIX ? waitpid($pid, WNOHANG) : waitpid($pid,0); if ($res > 0) { # We reaped a truly running process - delete $STARTEDCHILD{$pid}; + $STARTEDCHILD{$pid} = 0; } } } @@ -772,6 +772,9 @@ } $self->_execute($index,$cfg); + for (keys %STARTEDCHILD) { + delete $STARTEDCHILD{$_} unless $STARTEDCHILD{$_} + } if ($self->{entries_changed}) { dbg "rebuilding queue"; I don't know whether this will work or not, but I'm testing it in live as we speak. I see from another bug that you are planning a 1.0 release soon, so please let me feed back to you on whether this fix works or not before you do release it. My daemon dies once every two days or more, so this should show up pretty quickly thanks Clint
Hi Clint, thanks for your investigation. Since the problem is hard to reproduce and your patch looks rather unintrusive, I will add it to the next (==final) release. Expect a 1.00_1 until the end of this week. thanks ...
CC: develop [...] traveljury.com
Subject: Re: [rt.cpan.org #55741] Working around a Perl bug
Date: Mon, 22 Mar 2010 12:55:48 +0100
To: bug-Schedule-Cron [...] rt.cpan.org
From: Clinton Gormley <clint [...] traveljury.com>
Hi Roland Show quoted text
> thanks for your investigation. Since the problem is > hard to reproduce and your patch looks rather > unintrusive, I will add it to the next (==final) > release. Expect a 1.00_1 until the end of this week.
great btw, so far my patch seems to be working in live. will let you know later on this week ta clint
Subject: Re: [rt.cpan.org #55741] Working around a Perl bug
Date: Mon, 22 Mar 2010 12:58:43 +0100
To: bug-Schedule-Cron [...] rt.cpan.org
From: Roland Huß <Roland.Huss [...] consol.de>
Hi, On 22.03.2010, at 12:56, Clinton Gormley via RT wrote: Show quoted text
> Queue: Schedule-Cron > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=55741 > > > Hi Roland >
>> thanks for your investigation. Since the problem is >> hard to reproduce and your patch looks rather >> unintrusive, I will add it to the next (==final) >> release. Expect a 1.00_1 until the end of this week.
> > great > > btw, so far my patch seems to be working in live. will let you know > later on this week >
Thanks. BTW, I pushed out a 1.00_1 this morning on CPAN including this fix. Maybe you could give a try, too. ciao ... -- ...roland
Show quoted text
> >> thanks for your investigation. Since the problem is > >> hard to reproduce and your patch looks rather > >> unintrusive, I will add it to the next (==final) > >> release. Expect a 1.00_1 until the end of this week.
Just some follow up - my daemon hasn't died since I put the fix in, so it seems to have worked! ta clint
Subject: Re: [rt.cpan.org #55741] Working around a Perl bug
Date: Wed, 24 Mar 2010 18:34:28 +0100
To: bug-Schedule-Cron [...] rt.cpan.org
From: Roland Huß <Roland.Huss [...] consol.de>
Hi Clint, On 24.03.2010, at 18:20, Clinton Gormley via RT wrote: Show quoted text
> Queue: Schedule-Cron > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=55741 > >
>>>> thanks for your investigation. Since the problem is >>>> hard to reproduce and your patch looks rather >>>> unintrusive, I will add it to the next (==final) >>>> release. Expect a 1.00_1 until the end of this week.
> > Just some follow up - my daemon hasn't died since I put the fix in, so > it seems to have worked!
Great news ! If no big things happen, I will release Schedule::Cron 1.00 this weekend (finally, after 10 years+ ;-) BTW, 1.00_1 is already out, incoporating your fix (in some slightly modified way). thanks for your investigations and your patches ... -- ...roland
Hi Roland I've just come across an issue while using a reaper very similar to yours - $! was sometimes being set by waitpid, and interfering with code at a distance. Whether this is an issue or not, this patch won't do any harm: diff -ruN Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm --- Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm 2010-03-22 08:24:37.000000000 +0100 +++ Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm 2010-04-01 21:51:50.000000000 +0200 @@ -123,6 +123,7 @@ ); sub REAPER { + local $!; if ($HAS_POSIX) { foreach my $pid (keys %STARTEDCHILD) { thanks Clint
...uploaded the patch instead
Subject: localize_errno.patch
diff -ruN Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm --- Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm 2010-03-22 08:24:37.000000000 +0100 +++ Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm 2010-04-01 21:51:50.000000000 +0200 @@ -123,6 +123,7 @@ ); sub REAPER { + local $!; if ($HAS_POSIX) { foreach my $pid (keys %STARTEDCHILD) {
On second thoughts, worth local'ising %! as well, as that can also be set incorrectly
Subject: localize_errno_2.patch
diff -ruN Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm --- Schedule-Cron-1.00_1_a/lib/Schedule/Cron.pm 2010-03-22 08:24:37.000000000 +0100 +++ Schedule-Cron-1.00_1_b/lib/Schedule/Cron.pm 2010-04-01 22:00:10.000000000 +0200 @@ -123,6 +123,7 @@ ); sub REAPER { + local ($!,%!); if ($HAS_POSIX) { foreach my $pid (keys %STARTEDCHILD) {
Hiya Roland With my patches in place, the daemon is running beautifully - I've had no further problems. Any news on the 1.00 release? thanks Clint
Hi Clint, thanks for the reminder ;-) I just released 1.0 and pushed it to CPAN. Thanks a lot for your patches and investigations ... ...roland
Incorporated patches which seem to fix the issue. Fixed in 1.00
CC: develop [...] traveljury.com
Subject: Re: [rt.cpan.org #55741] Working around a Perl bug
Date: Fri, 14 May 2010 17:46:03 +0200
To: bug-Schedule-Cron [...] rt.cpan.org
From: Clinton Gormley <clint [...] traveljury.com>
On Fri, 2010-05-14 at 11:41 -0400, Roland Huss via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=55741 > > > Hi Clint, > > thanks for the reminder ;-) I just released 1.0 and pushed it to CPAN.
w00t :) many thanks Show quoted text
> > Thanks a lot for your patches and investigations ...
np, and thanks for writing the module in the first place clint Show quoted text
> > ...roland
Hi Clint,

it's again me. While cleaning up my RT queue, I found an unhandled patch 
for fixing an issue on Windows systems where noch SIGCHLD is fired at all.
(RT #56926)

I put in the fix in a 1.01_1 release and I'm quite confident that it shouldn't 
harm our current solution (It's mainly about calling REAPER() on 
POSIX systems when cleaning up the process list, so waitpid(-1,WNOHANG)
should return immediately when no finished childs can be found). 

Nevertheless, I would be very happy, if you could give this release 1.01_1 a try, too
before I release it in the large.

thanx again ...
On Fri May 14 11:46:19 2010, clint@traveljury.com wrote:
Show quoted text
> On Fri, 2010-05-14 at 11:41 -0400, Roland Huss via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=55741 >
> >
> > Hi Clint,
> >
> > thanks for the reminder ;-) I just released 1.0 and pushed it to CPAN.
>
> w00t :)
>
> many thanks
>
> >
> > Thanks a lot for your patches and investigations ...
>
> np, and thanks for writing the module in the first place
>
> clint
> >
> > ...roland
>


Subject: Re: [rt.cpan.org #55741] Working around a Perl bug
Date: Sat, 15 May 2010 14:38:26 +0200
To: bug-Schedule-Cron [...] rt.cpan.org
From: Clinton Gormley <clint [...] traveljury.com>
Show quoted text
> it's again me. While cleaning up my RT queue, I found an unhandled patch > for fixing an issue on Windows systems where noch SIGCHLD is fired at all. > (RT #56926)
Show quoted text
> Nevertheless, I would be very happy, if you could give this release 1.01_1 a > try, too > before I release it in the large.
will do - I'll install it today, just remind me to come back to you (cos if it just works, I'll forget :) clint
On Sat May 15 08:38:42 2010, clint@traveljury.com wrote: Show quoted text
>
> > it's again me. While cleaning up my RT queue, I found an unhandled
> patch
> > for fixing an issue on Windows systems where noch SIGCHLD is fired
> at all.
> > (RT #56926)
>
Hi Roland I've tried v 1.01_1 and it fails for me - whenever I run a system call (ie an external program) from one of my children, it exits with err -1. ta clint
Hi Clinton,

thanks for trying 1.0.1 ! Good to know that there are issues, in fact I was afraid that 
I overoptimized the code a bit. I will dig into this again (need some spare time, though) 
and would be happy if I could contact you when I think I have a better solution.

thanx ...

On Wed May 19 10:34:42 2010, DRTECH wrote:
Show quoted text
> On Sat May 15 08:38:42 2010, clint@traveljury.com wrote:
> >
> > > it's again me. While cleaning up my RT queue, I found an unhandled
> > patch
> > > for fixing an issue on Windows systems where noch SIGCHLD is fired
> > at all.
> > > (RT #56926)
> >
>
> Hi Roland
>
> I've tried v 1.01_1 and it fails for me - whenever I run a system call
> (ie an external program) from one of my children, it exits with err -1.
>
> ta
>
> clint

I reverted it now to the previous 1.00 behaviour since I could verify the problem with 1.01_1.
The reason is, that the reaper also reaped childs, which were not forked on its own 
(e.g. by a system call). 

The version now should be save again. Since it now has been a long time since
the last update, you are probably out of this topic ;-) but if you could verify 1.01_2 
this would be a fine thing. Still have to look, how to fix  56926 , though.

bye ...
... roland

1.01 exhibits now the same behaviour as 1.00 which was confirmed to 
fix this bug. So I will finally close this ticket. 

thanks ...
... roland