Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the MCE CPAN distribution.

Report information
The Basics
Id: 111780
Status: resolved
Priority: 0/
Queue: MCE

People
Owner: Nobody in particular
Requestors: mckeowbc [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 1.700



Subject: MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 04 Feb 2016 16:57:10 +0000
To: "bug-MCE [...] rt.cpan.org" <bug-MCE [...] rt.cpan.org>
From: "Benjamin C. McKeown" <mckeowbc [...] gmail.com>
I've noticed that if a MCE worker dies in such a way that its END block does not get called then the MCE manager will block on a socket call and never terminate. We see this occurring occasionally when using XS modules in a worker that can cause Perl to die in uncontrollable manner. We've also seen it occasionally happen if a Perl worker runs out of memory. The follow code demonstrates the behavior. --- use MCE::Loop; mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; foreach my $item (@{$chunk_ref}) { if($item > 100 and $item % 9 == 0) { MCE->say(\*STDERR,"Worker " . MCE->wid . " dying in a bad way from item $item"); kill 9,$$; } } } 1..10_000; --- I've verified that the problem still exists in 1.699_009. I've been able to get MCE to exit gracefully in this scenario by patching MCE::Core::Manager to wrap all reads from $_DAT_R_SOCK with a select statement, then call waitpid and cleanup the _total_running and _total_workers counts if no data is received after a reasonable timeout.
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 4 Feb 2016 12:14:55 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hello Benjamin, Can you provide the mentioned patch? That will be helpful. Thanks, Mario
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 4 Feb 2016 12:18:42 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
I forgot to ask, are you running on Windows or a version of UNIX; e.g. AIX, BSD, Darwin, Linux, Solaris?
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 4 Feb 2016 12:39:54 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
To not impact performance, another way is via an ALARM signal inside MCE::Core::Manager. The ALARM handler decrements total running/workers if process is no longer available. This is likely to work on UNIX for child processes. What is reasonable amount of time? 1 minute 2 minutes 5 minutes 10 minutes Regards, Mario
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 04 Feb 2016 20:03:59 +0000
To: bug-MCE [...] rt.cpan.org
From: "Benjamin C. McKeown" <mckeowbc [...] gmail.com>
Mario, Here's the patch I created for MCE::Core::Manager 1.699_009. I used some Perl cookbook code to implement a sysreadline call that wraps all of the reads in a select statement. Maybe not the most efficient thing, but it seems to work. I haven't benchmarked it though. As for a timeout, 2 minutes seems reasonable. It might actually be a nice feature to be able to specify a timeout for how long a worker can spin on a chunk until the manager tears it down. Just a thought. Thanks much. -Ben On Thu, Feb 4, 2016 at 12:40 PM Mario Roy via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=111780 > > > To not impact performance, another way is via an ALARM signal inside > MCE::Core::Manager. The ALARM handler decrements total running/workers > if process is no longer available. This is likely to work on UNIX for > child processes. > > What is reasonable amount of time? > > 1 minute > 2 minutes > 5 minutes > 10 minutes > > Regards, > Mario > >

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Thu, 4 Feb 2016 21:27:29 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hello Benjamin. That is quite a patch. You have done all the right things. I'm impressed. There is an edge case though. Other workers may call MCE->exit while the manager process is calling waitpid inside the if block. Thus, worker(s) must obtain a reply from the manger process to not corrupt total_exited/running/workers. <$_DAU_W_SOCK>; # added to MCE->exit before $_DAT_LOCK->unlock() Likewise, the manager process must send a reply after all reads inside the OUTPUT_W_EXT handler. print {$_DAU_R_SOCK} $LF; # added to OUTPUT_W_EXT before reaping child/thread That completes your patch. I applied a small correction for sysread. from...: $channel = sysread($_DAT_R_SOCK, 2); to this: sysread($_DAT_R_SOCK, $channel, 2); It's time to benchmark this stuff, shall we. I tried an alternative strategy via an *ALRM* handler. The goal is for this to not impact the manager process. perl mce-examples/other/forseq.pl 50000 >/dev/null Manager.pm.patch 1.257 seconds Manager.pm (orig) 0.334 seconds Manager.pm (alrm) 0.343 seconds There is a new MCE option in truck *loop_timeout*. The option is disabled by default and ignored on the Windows platform and for MCE workers spawned as threads. https://github.com/marioroy/mce-perl/commit/23db32ba365686f651eaafd9155d3b8efabab32a MCE::Loop::init( loop_timeout => 5 ); Thank you for reporting. - mario
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Mon, 8 Feb 2016 20:22:06 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Benjamin, Today, released 1.699_010 containing the fix. https://metacpan.org/release/MARIOROY/MCE-1.699_010 All looks well and hoping for a final 1.7 sometime this month. I completed 360 as far as MCE::Shared functionality with 1.699_010. My plan now is completing the rest of the documentation. Regards, Mario
Hello Benjamin, Am very pleased to announce that MCE 1.700 has been released. Kind regards, Mario
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Tue, 08 Mar 2016 22:50:27 +0000
To: bug-MCE [...] rt.cpan.org
From: "Benjamin C. McKeown" <mckeowbc [...] gmail.com>
That's great news, thanks for letting me know. On Tue, Mar 8, 2016, 5:10 PM Mario Roy via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=111780 > > > Hello Benjamin, > > Am very pleased to announce that MCE 1.700 has been released. > > Kind regards, > Mario >
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Sun, 20 Mar 2016 01:22:24 -0400
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Benjamin, The MCE 1.703 and MCE::Shared 1.001 distributions are completed. MCE supports Perl v5.8.0 and higher, whereas MCE::Shared supports Perl v5.10.1 and higher. MCE::Hobo is included with MCE::Shared. https://metacpan.org/pod/distribution/MCE/lib/MCE.pod https://metacpan.org/pod/MCE::Shared https://metacpan.org/pod/MCE::Hobo This stuff is powerful. All environments run optimally including Cygwin and Windows. I'm very glad to have reached the finish line, at last. Regards, Mario
Subject: Re: [rt.cpan.org #111780] MCE locks up if a worker quits in a non-graceful manner
Date: Sun, 20 Mar 2016 11:05:59 +0000
To: bug-MCE [...] rt.cpan.org
From: "Benjamin C. McKeown" <mckeowbc [...] gmail.com>
Nice! Thanks. I'll have to give it a try tomorrow if I get time. -Ben On Sun, Mar 20, 2016 at 1:23 AM Mario Roy via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=111780 > > > Hi Benjamin, > > The MCE 1.703 and MCE::Shared 1.001 distributions are completed. MCE > supports Perl v5.8.0 and higher, whereas MCE::Shared supports Perl > v5.10.1 and higher. MCE::Hobo is included with MCE::Shared. > > https://metacpan.org/pod/distribution/MCE/lib/MCE.pod > > https://metacpan.org/pod/MCE::Shared > > https://metacpan.org/pod/MCE::Hobo > > This stuff is powerful. All environments run optimally including > Cygwin and Windows. I'm very glad to have reached the finish line, at > last. > > Regards, > > Mario > >