Skip Menu |

This queue is for tickets about the POE CPAN distribution.

Report information
The Basics
Id: 45109
Status: resolved
Priority: 0/
Queue: POE

People
Owner: Nobody in particular
Requestors: dolmen [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.004
Fixed in: (no value)



Subject: Deep recursion on subroutine "POE::Kernel::_loop_signal_handler_chld"
Testing POE 1.004 on HP-UX B.11.11, PARISC: t/90_regression/rt39872-sigchld-stop.t ....... # Wait 3 24086: Deep recursion on subroutine "POE::Kernel::_loop_signal_handler_chld" at /home/nagios/agent/.cpan/build/POE-1.004-vS4Y6T/blib/lib/POE/Loop/PerlSignals.pm line 59. Pid 24086 received a SIGSEGV for stack growth failure. Possible causes: insufficient memory or swap space, or stack size exceeded maxssiz.
Here is the test output: $ perl -I blib/lib t/90_regression/rt39872-sigchld-stop.t 1..3 8368: _start at t/90_regression/rt39872-sigchld-stop.t line 59. 8368: Parent at t/90_regression/rt39872-sigchld-stop.t line 49. 8369: child at t/90_regression/rt39872-sigchld-stop.t line 106. 8370: child at t/90_regression/rt39872-sigchld-stop.t line 106. # Wait 3 8368: parent at t/90_regression/rt39872-sigchld-stop.t line 96. 8369: USR1 at t/90_regression/rt39872-sigchld-stop.t line 115. <rc> decremented extref ``USR1'' (now 0) for session 2 (worker) at blib/lib/POE/Resource/Extrefs.pm line 100 <rc> decrementing refcount for session 2 (worker) at blib/lib/POE/Resource/Sessions.pm line 331 <rc> decrementing refcount for session 2 () at blib/lib/POE/Resource/Sessions.pm line 331 <rc> +----- GC test for session 2 () (POE::Session=ARRAY(0x80000001006d15e8)) ----- <rc> | total refcnt : 0 <rc> | event count : 0 <rc> | post count : 0 <rc> | child sessions: 0 <rc> | handles in use: 0 <rc> | aliases in use: 0 <rc> | extra refs : 0 <rc> | pid count : 0 <rc> +--------------------------------------------------- <rc> | session 2 () is garbage; stopping it... <rc> +--------------------------------------------------- <rc> | called at blib/lib/POE/Resource/Sessions.pm line 408 POE::Kernel::_data_ses_collect_garbage('POE::Kernel=ARRAY(0x800000010025b4e8)','POE::Session=ARRAY(0x80000001006d15e8)') called at blib/lib/POE/Resource/Signals.pm line 430 POE::Kernel::_data_sig_free_terminated_sessions('POE::Kernel=ARRAY(0x800000010025b4e8)') called at blib/lib/POE/Kernel.pm line 988 POE::Kernel::_dispatch_event('POE::Kernel=ARRAY(0x800000010025b4e8)','POE::Kernel=ARRAY(0x800000010025b4e8)','POE::Kernel=ARRAY(0x800000010025b4e8)','_signal',16,'ARRAY(0x80000001006d21c8)','blib/lib/POE/Loop/PerlSignals.pm',35,'undef',...) called at blib/lib/POE/Resource/Events.pm line 265 POE::Kernel::_data_ev_dispatch_due('POE::Kernel=ARRAY(0x800000010025b4e8)') called at blib/lib/POE/Loop/Select.pm line 327 POE::Kernel::loop_do_timeslice('POE::Kernel=ARRAY(0x800000010025b4e8)') called at blib/lib/POE/Loop/Select.pm line 335 POE::Kernel::loop_run('POE::Kernel=ARRAY(0x800000010025b4e8)') called at blib/lib/POE/Kernel.pm line 1291 POE::Kernel::run('POE::Kernel=ARRAY(0x800000010025b4e8)') called at t/90_regression/rt39872-sigchld-stop.t line 50 8369: _stop at t/90_regression/rt39872-sigchld-stop.t line 125. <rc> decrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 331 <rc> ,----- Kernel Activity ----- <rc> | Events : 0 <rc> | Files : 0 <rc> | Extra : 0 <rc> | Procs : <rc> `--------------------------- <rc> ... at blib/lib/POE/Kernel.pm line 622 <rc> incrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 358 <rc> incrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 358 <rc> decrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 331 <rc> decrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 331 <rc> incrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 358 <rc> incrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 358 <rc> decrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 331 <rc> decrementing refcount for session HP_P1-49ec3074000020b0 (POE::Kernel=ARRAY(0x800000010025b4e8)) at blib/lib/POE/Resource/Sessions.pm line 331 8369: Exit at t/90_regression/rt39872-sigchld-stop.t line 52. # No tests run! Pid 8368 received a SIGSEGV for stack growth failure. Possible causes: insufficient memory or swap space, or stack size exceeded maxssiz. Memory fault(coredump)
Attaching tusc (Trace Unix System Calls, the HPUX 'truss') output for an other run. $ tusc perl -I blib/lib t/90_regression/rt39872-sigchld-stop.t > rt45109.txt

Message body is not shown because it is too large.

Subject: Deep recursion on subroutine "POE::Kernel::_loop_signal_handler_chld" with USE_SIGCHLD
Disabling USE_SIGCHLD in the test makes it pass. It looks like my perl (5.8.0 on hpux PARISC) is too old. Could you help me to diagnose if this bug is critical? Will I encounters some problems with my application which uses POE::Wheel::Run and monitors SIGCHLD without USE_SIGCHLD?
On Mon Apr 20 05:58:28 2009, DOLMEN wrote: Show quoted text
> Disabling USE_SIGCHLD in the test makes it pass. > It looks like my perl (5.8.0 on hpux PARISC) is too old. > > Could you help me to diagnose if this bug is critical? > Will I encounters some problems with my application which uses > POE::Wheel::Run and monitors SIGCHLD without USE_SIGCHLD?
I've attached a SIGCHLD handler test case that doesn't use POE. It tries to mimic POE's SIGCHLD handler, and it tries to detect deep recursion before Perl crashes. Please try the test case and add the output to this ticket. Thank you for your help. The deep recursion is happening in POE's SIGCHLD handler at the point where $SIG{CHLD} is reset. My best guess is that resetting $SIG{CHLD} immediately triggers another SIGCHLD, causing the SIGCHLD handler to be called from within itself. Deep recursion results. It may be that HP-UX behaves slightly different in this regard. In this case, Perl should be smoothing over the differences so that its basic signal handling behaves the same. Perl 5.8.0 is rather buggy, as Perl versions go. It seems likely that a newer version of Perl fixes this issue. I would try the test case with Perl 5.8.0 to see whether the test case reproduces the problem. I would then try it with 5.8.8 to see whether the problem has been fixed in Perl. If 5.8.8 also recurses deeply, I would ask the perl5porters list to try the test case and respond whether it's a Perl issue or a known behavior. If they decide the behavior should stand, then I would consider restructuring POE's signal handler to work around it.
#!perl use warnings; use strict; my $pid = fork(); die "fork failed: $!" unless defined $pid; # Child sleeps briefly, then exits. unless ($pid) { print "$$ child sleeping...\n"; sleep 3; print "$$ child exiting.\n"; exit; } # Parent keeps busy until the child exits. $SIG{CHLD} = \&handle_sigchld; my $recursion_level = 0; my $got_child = 0; while (not $got_child) { print "$$ parent waiting for child.\n"; sleep 1; } # And to be clean about it, reap the child when we're done. my $reaped = wait(); print "$$ parent spawned $pid, reaped $reaped\n"; exit; # Testing a potential HP-UX issue related to rt.cpan.org ticket 45109. # The error message implies that a SIGCHLD handler is being called # immediately upon setting $SIG{CHLD}, if there is a child to be # reaped. In POE's case, this could retrigger the SIGCHLD handler # recursively. The following SIGCHLD handler tries to re-create and # analyze the conditions. sub handle_sigchld { $got_child++; $recursion_level++; print "$$ parent SIGCHLD handler recursion level $recursion_level\n"; if ($recursion_level > 9) { print "$$ platform issue on $^O: deep recursion in SIGCHLD handler\n"; return; } $SIG{CHLD} = \&handle_sigchld; $recursion_level--; }
Here is the test case output on HP-UX B.11.11 IA64 with Perl 5.8.8: --------8<------8<------8<------8<------8<------ This is perl, v5.8.8 built for IA64.ARCHREV_0-thread-multi-LP64 (with 26 registered patches, see perl -V for more detail) Copyright 1987-2006, Larry Wall Binary build 817 [257965] provided by ActiveState http://www.ActiveState.com Built Mar 20 2006 20:52:50 --------8<------8<------8<------8<------8<------ 2297 child sleeping... 2296 parent waiting for child. 2296 parent waiting for child. 2296 parent waiting for child. 2297 child exiting. 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent SIGCHLD handler recursion level 1 2296 parent spawned 2297, reaped 2297 --------8<------8<------8<------8<------8<------ It looks like there is in fact a kind of recursion: when it is changed the handler is recalled 8 times after it exits.
The test t/90_regression/rt39872-sigchld-stop.t is now passing with POE 1.006. The CHANGES of POES 1.006 says something has been discovered and fixed on SunOS related to the same test case. What has changed: POE or the test case?
On Tue Jun 02 09:49:03 2009, DOLMEN wrote: Show quoted text
> The test t/90_regression/rt39872-sigchld-stop.t is now passing with POE > 1.006. > > The CHANGES of POES 1.006 says something has been discovered and fixed > on SunOS related to the same test case. What has changed: POE or the > test case?
A different test case was changed (t/90_regression/rt39872-sigchld.t). The change should not have affected this test. Perhaps there is a race condition that doesn't always fail the same way? Thank you for including the new test case output. I'll look at it after work this week.
Hello. Sorry for the delay. Signal handling has changed significantly in POE 1.007 (already on CPAN). Could you see whether this has fixed things for you?
I've deferred resetting $SIG{CHLD} and $SIG{CLD} until after the waitpid() loop, which is the recommended way to do things on SysV-ish systems (like I assume HP-UX is). The fix is in revision 2667, which will go out with the 1.267 release sometime this weekend.
All tests pass for POE-1.268 on HP-UX.
Thank you for confirming the fix.