Bug #23716 for forks: Bug in perl-forks in SIGCHLD signal handler

Thu Nov 30 04:47:22 2006 alban.crequy [...] seanodes.com - Ticket created

CC:	Pierre Phaneuf <pierre.phaneuf [...] seanodes.com>
Subject:	Bug in perl-forks in SIGCHLD signal handler
Date:	Thu, 30 Nov 2006 10:46:28 +0100
To:	bug-forks [...] rt.cpan.org
From:	Alban Crequy <alban.crequy [...] seanodes.com>

Hello, I have a bug on perl-forks-0.19 in the SIGCHLD signal handler. When reading the code of perl-forks-0.20, I think the bug is still present. === Hypothesis === In forks.pm: unless ($FORCE_SIGCHLD_IGNORE) { local $ENV{PATH} = "/bin:/usr/bin"; if (system('/bin/test') == -1) { $SIG{CHLD} = sub { reaper::REAPER ( shift, \&REAPER ); }; $CUSTOM_SIGCHLD = 1; } else { $CUSTOM_SIGCHLD = 0; } } I don't know what means the condition with /bin/test. I guess it is something that says if we are running Windows or Linux (?) Anyway, on Linux Mandriva, '/bin/test' does not exist: the "test" program is in /usr/bin, not in /bin. $ echo 'if (system("/bin/test") == -1) { print "yes\n"; } else { print "no\n" } '|perl yes Maybe you want to use the $^O special variable? In "man perlvar": $OSNAME $^O The name of the operating system (...) === Symptoms === $ perl -v, uname -a, distro This is perl, v5.8.7 built for i386-linux Linux alban 2.6.12-25mdk #1 Fri Aug 18 15:09:47 MDT 2006 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz unknown GNU/Linux Mandriva Linux release 2006.0 (Official) for i586 $ ps f 15317 pts/18 S+ 0:14 \_ /usr/bin/perl mytest 15319 pts/18 S+ 0:00 \_ /usr/bin/perl mytest 16405 pts/18 Z+ 0:00 \_ [mytest] <defunct> $ strace -p 15317 read(3, <unfinished ...> $ strace -p 15319 select(16, [6 7], NULL, NULL, NULL $ ls -l /proc/1531[79]/fd/ /proc/15317/fd/: total 5 lrwx------ 1 acrequy dev 64 Nov 30 09:01 0 -> /dev/pts/18 lrwx------ 1 acrequy dev 64 Nov 30 09:01 1 -> /dev/pts/18 lrwx------ 1 acrequy dev 64 Nov 30 08:45 2 -> /dev/pts/18 lrwx------ 1 acrequy dev 64 Nov 30 09:01 3 -> socket:[34560951] lrwx------ 1 acrequy dev 64 Nov 30 09:01 7 -> socket:[34548781] /proc/15319/fd/: total 8 lrwx------ 1 acrequy dev 64 Nov 30 09:02 0 -> /dev/pts/18 lrwx------ 1 acrequy dev 64 Nov 30 09:02 1 -> /dev/pts/18 lrwx------ 1 acrequy dev 64 Nov 30 08:45 2 -> /dev/pts/18 lr-x------ 1 acrequy dev 64 Nov 30 09:02 3 -> /usr/bin/... lr-x------ 1 acrequy dev 64 Nov 30 09:02 4 -> /usr/lib/... lr-x------ 1 acrequy dev 64 Nov 30 09:02 5 -> /usr/lib/... lrwx------ 1 acrequy dev 64 Nov 30 09:02 6 -> socket:[34548779] lrwx------ 1 acrequy dev 64 Nov 30 09:02 7 -> socket:[34548782] Process 15317 and 15319 are deadlocked. I tested to run: kill -SIGCHLD 15317 15319 but nothing happen: process stay deadlocked and strace show nothing new (stay in read/select). === Reproductible === Reproduced 2 times. However, I have to run my program for ~12 hours before the bug happens. I do not know how to reproduce the bug each time. Have a nice day, Alban

Sun Dec 10 00:53:32 2006 RYBSKEJ [...] cpan.org - Taken

Sun Dec 10 01:14:19 2006 RYBSKEJ [...] cpan.org - Correspondence added

With regards to the system() call using /bin/test, that is intended to be a simple system call just to check whether or not system() returns -1 or not when SIGCHLD handler is set to 'IGNORE'. If I do get a -1, then I have to install a custom SIGCHLD handler to reap processes; however, to insure that perl safe signals don't cause havoc with thread processes using blocking recv(), forks currently defers the CHLD signal until it gets a response from the forks server process (if it was communicating with the process at the time the signal occured). Thus, thread processes should appear to "ignore" SIGCHLD signals if they are blocking on system read or write functions. Now, with regards to the issue that you stated deals with "deadlocked" threads, can you give a few more details regarding the issue? From your post, I see you have two processes and one defunct one: Show quoted text

>$ ps f >15317 pts/18 S+ 0:14 \_ /usr/bin/perl mytest >15319 pts/18 S+ 0:00 \_ /usr/bin/perl mytest >16405 pts/18 Z+ 0:00 \_ [mytest] <defunct>

$ strace -p 15317 read(3, <unfinished ...> $ strace -p 15319 select(16, [6 7], NULL, NULL, NULL Some general question: - What is the process hierarchy here? Specifically, can you report the PPID of these processes? Cote: Based on the fact that 15319 is doing a select, I assume it is the central managing server process, leaving 15317 to be the main thread which looks to be blocking, waiting for a response from the server. I assume 16405 is some (grand)child of 15317 that is yet to be reaped. - Are either of the still-running processes using high CPU, or they mostly waiting? - Can you provide a small code sample that shows what your threaded app might have been doing at the time this issue occured?

Sun Dec 10 01:14:21 2006 The RT System itself - Status changed from 'new' to 'open'

Sat Sep 29 18:40:36 2007 RYBSKEJ [...] cpan.org - Correspondence added

This bug appears to not have re-occurred since signal handling, socket management, and other miscellaneous core improvements were made since in forks 0.19 (current release as of this writing is 0.25). Thus, I consider this bug resolved and closed.

Sat Sep 29 18:40:40 2007 RYBSKEJ [...] cpan.org - Status changed from 'open' to 'resolved'