Bug #50398 for IPC-Cmd: IPC::Cmd timeout issue

Sun Oct 11 10:39:25 2009 kane [...] cpan.org - Ticket created

Subject:

IPC::Cmd timeout issue

Hello, Jos. I've ran into an issue with IPC::Cmd when using timeout feature combined with _open3_run or _system_run calls. When timeout happens IPC::Cmd returns control to my program, but does not kill running child. If using _cmd_run call -- child is killed (which is done by Cmd::Run internally upon receiving SIGALRM). I've implemented killing functionality for open3 and system codepaths. For_open3_run -- by signaling $pid returned by open3 with -9. For _system_run -- by signaling myself with -15, masking it before so the caller does not die -- not sure that this is quite clean but anyway _system_run is used as a last resort so we might think it will not be used in serious programs. Here's the patch against IPC::Cmd 0.46: --- Cmd.pm.orig 2009-06-12 17:40:03.000000000 +0400 +++ Cmd.pm 2009-08-21 11:48:09.000000000 +0400 @@ -426,6 +426,8 @@ ### if we are allowed to run verbose, just dispatch the system command } else { + $self->{'need_term'} = 1; + $self->_debug( "# Using system(). Have buffer: $have_buffer" ) if $DEBUG; $ok = $self->_system_run( $cmd, $verbose ); @@ -443,7 +445,16 @@ ### alarm happened if ( $@ and ref $@ and $@->isa( ALARM_CLASS ) ) { $err = $@->(); # the error code is an expired alarm + + kill -9, $self->{'child_pid'} if $self->{'child_pid'}; + if ($self->{'need_term'}) { + my $old_term = $SIG{TERM}; + local $SIG{TERM} = sub { return 1; }; + kill -15, $$ if $self->{'need_term'}; + local $SIG{TERM} = $old_term; + } + ### another error happened, set by the dispatchub } else { $err = $self->error; @@ -517,6 +528,8 @@ return; }; + $self->{'child_pid'} = $pid; + ### use OUR stdin, not $kidin. Somehow, ### we never get the input.. so jump through ### some hoops to do it :( If you think that the patch is ok, it would be really nice if you could release the next version of IPC::Cmd in near future, because the program which uses it is going to be installed on several thousands of servers so it would be handy to have the new version in CPAN rather than distributing the patched version by other means. Thanks for attention and for the module too. Regards, Petya. ps: not sure about your emails, sending the letter both to cpan.org and your personal e-mail.

Sun Oct 11 10:39:37 2009 kane [...] cpan.org - Correspondence added

Jos, there's another smaller issue with IPC::Cmd. When used on CPU intensive long-running tasks current open3 implementation begins to eat up CPU because of the read loop. To solve the problem we could use a small sleep in each read cycle. Here's the patch combined with previous patch: --- Cmd.pm.orig 2009-06-12 17:40:03.000000000 +0400 +++ Cmd.pm 2009-08-23 01:59:43.000000000 +0400 @@ -33,6 +33,7 @@ use Text::ParseWords (); # import ONLY if needed! use Module::Load::Conditional qw[can_load]; use Locale::Maketext::Simple Style => 'gettext'; +use Time::HiRes qw[usleep]; =pod @@ -426,6 +427,8 @@ ### if we are allowed to run verbose, just dispatch the system command } else { + $self->{'need_term'} = 1; + $self->_debug( "# Using system(). Have buffer: $have_buffer" ) if $DEBUG; $ok = $self->_system_run( $cmd, $verbose ); @@ -443,7 +446,16 @@ ### alarm happened if ( $@ and ref $@ and $@->isa( ALARM_CLASS ) ) { $err = $@->(); # the error code is an expired alarm + + kill -9, $self->{'child_pid'} if $self->{'child_pid'}; + if ($self->{'need_term'}) { + my $old_term = $SIG{TERM}; + local $SIG{TERM} = sub { return 1; }; + kill -15, $$ if $self->{'need_term'}; + local $SIG{TERM} = $old_term; + } + ### another error happened, set by the dispatchub } else { $err = $self->error; @@ -517,6 +529,8 @@ return; }; + $self->{'child_pid'} = $pid; + ### use OUR stdin, not $kidin. Somehow, ### we never get the input.. so jump through ### some hoops to do it :( @@ -535,7 +549,7 @@ my $stdout_done = 0; my $stderr_done = 0; OUTER: while ( my @ready = $selector->can_read ) { - + usleep(1); for my $h ( @ready ) { my $buf; Regards, Petya.

Sun Oct 11 10:39:38 2009 kane [...] cpan.org - Status changed from 'new' to 'open'

Sun Oct 11 10:39:56 2009 kane [...] cpan.org - Correspondence added

Jos, there is an error in my previous patch when killing timeouted processes. Actually I've thought about the problem and understood that we can't kill process group (what I was trying to do) because people run the function in their process. On the other hand it's obviously not enough to kill just the child process (pid that we've got from open3 call), because it might spawn other children. So I see two solutions here. First one is find all the children spawned by pid return by open3 call (using Proc::ProcessTable) which is quite simple but adds another requirement for your module. Second one is rewrite function logic: do fork in run_open3 and run open3 in child creating separate process group. Parent will wait for the child and terminate child process group in case of timeout. Parent also should get open3 stdout and stderr buffers from child (sockets, for example). Second solution seems to be more complex. As you seem to be busy at the moment and I can't wait that long -- in my application I've dropped use of IPC::Cmd and almost copied your code from open3_run and implemented 1st child killing solution (using ProcessTable). I'm not sure how well this works under Windows, but I'll have to make it work under Linux and FreeBSD. If you need the code -- let me know, I will do the patch for you. Regards, Petya.

Sun Oct 11 10:40:07 2009 kane [...] cpan.org - Correspondence added

Hello, colleagues. Several months ago I needed a function in perl to execute program, get the output and optionally kill the program if it runs too long. I've tried IPC::Run, IPC::Cmd and some other modules, but could not find any modules where timeouts (if implemented) work in case the called program spawns some children. So I've implememented the described functionality (in a few words: fork, set up new process session in child, run needed program capturing output, terminate whole session from parent in case of timeout). I've talked about the situation to kane@cpan.org but couldn't get his opinion (last response from him came about a month and a half ago). I want my code to be available so people could use it, so I'm thinking of uploading new module, naming it IPC::Exec for example since other names are used. I'm open to including the code into some other module (kane@cpan.org has my code but no the last version) but I'm not sure if authors of other modules can use it because it uses fork which might not work well for different people who might be using IPC::Run, IPC::Cmd and other. Current version of my code is here: http://www.nigilist.ru/nit/temp/Exec.pm Question to the gurus: would you create a new module in such situation? Regards, Petya. ps: here's the list of run/exec modules, I've checked: IPC-Capture-0.06.tar.gz IPC-Cmd-0.50.tar.gz IPC-Open3-Simple-0.04.tar.gz IPC-Open3-Utils-0.5.tar.gz IPC-Run-0.84.tar.gz IPC-Run-SafeHandles-0.02.tar.gz IPC-Run3-0.043.tar.gz IPC-System-Simple-1.18.tar.gz

Sun Oct 11 10:40:46 2009 kane [...] cpan.org - Requestor KANE deleted

Sun Oct 11 10:40:50 2009 kane [...] cpan.org - Requestor petya@nigilist.ru added

Sun Oct 11 10:42:02 2009 kane [...] cpan.org - Cc ADAMK added

Sun Oct 11 10:42:02 2009 kane [...] cpan.org - Cc chris@bingosnet.co.uk added

Sun Oct 11 10:44:36 2009 kane [...] cpan.org - Correspondence added

Hi Petya, I do appologize; I completel forgot about this email you sent. I've added it to the rt.cpan.org bug tracker so we don't lose track of it. Chris & I maintain the IPC::Cmd module and I'm quite open to adding your patch in. Big points are that it must be able to run cross platform, and shouldn't break any existing functionality; IPC::Cmd's a core module now and backwards compatibility and portability are very important. Sorry again for being flaky on this; I should have directed you to the bug tracker right away so things don't fall between the cracks. Let me know what you think.

Sun Oct 11 17:33:05 2009 http://lj.rossia.org/users/nit/ - Correspondence added

Jos, no problems. Sorry for being a bit strict, I really didn't know how to share my idea. Down to the problem. My program works well on linux and freebsd (tested on bunch of different versions). I haven't tested it on other platforms. I could probably make tests on SunOS (Open Solaris). Reading the code of IPC::Cmd I've made a conclusion that we do not need it to run on Windows and VMS since there're no working open3 and fork there (which are essential for my program). I see no problems regarding backward compatibility since you could keep all the old interfaces. Ready to help you with the integration of the code.

Tue Oct 20 05:40:42 2009 kane [...] cpan.org - Correspondence added

On Sun Oct 11 17:33:05 2009, http://lj.rossia.org/users/nit/ wrote: Show quoted text

> Jos, no problems. Sorry for being a bit strict, > I really didn't know how to share my idea. > > Down to the problem. > > My program works well on linux and freebsd > (tested on bunch of different versions). > I haven't tested it on other platforms. > I could probably make tests on SunOS > (Open Solaris). > > Reading the code of IPC::Cmd I've made a conclusion > that we do not need it to run on Windows and VMS > since there're no working open3 and fork there > (which are essential for my program). > > I see no problems regarding backward compatibility > since you could keep all the old interfaces. > > Ready to help you with the integration of the code.

Ok, well let's start with the most recent version of the patch; important is that the code paths can't be triggered from Win32 and then we can feed it to CPAN testers to see if it's stable across platforms.

Wed Nov 04 06:42:26 2009 http://lj.rossia.org/users/nit/ - Correspondence added

From:

Petya Kohts

Show quoted text

> Ok, well let's start with the most recent version of the patch; > important is that the code paths can't be triggered > from Win32 and then we can feed it to CPAN testers > to see if it's stable across platforms.

Attaching most recent version of patch done against IPC::Cmd 0.50. I've created two different calls which were not defined by IPC::Cmd before: run_forked and can_use_run_forked. The plan is to include run_forked into your module but leave current functionality unmodified -- and allow people to choose which interface to use. NB: i've totally disabled function on VMS and Windows, test suite returns ok(1, "platform not supported") for these. Here's usage example: use IPC::Cmd; my $r = IPC::Cmd::run_forked("sleep 4", {'timeout' => 1}); if ($r->{'timeout'}) { print "command timed out\n"; } print $r->{'stdout'} . "\n"; print $r->{'stderr'} . "\n"; Syntax is a bit different from your calls but you could easily adapt it if you need (though I suggest you leave it as it is, as it saves typing). Here's test suite for the function (should be altered to use blib/: #!/usr/bin/perl use strict; use warnings; use Test::More 'no_plan'; #use lib "/root/i"; use IPC::Cmd; if (!IPC::Cmd::can_use_run_forked()) { ok(1, "run_forked not available on this platform"); exit; } my $cmd = "echo out ; echo err >&2 ; sleep 4"; my $r = IPC::Cmd::run_forked($cmd, {'timeout' => 1}); ok(ref($r) eq 'HASH', "executed: $cmd"); ok($r->{'timeout'} eq 1, "timed out"); ok($r->{'stdout'}, "stdout: " . $r->{'stdout'}); ok($r->{'stderr'}, "stderr: " . $r->{'stderr'});

Message body is not shown because it is too large.

Thu Nov 05 18:17:34 2009 BINGOS [...] cpan.org - Correspondence added 30 min

On Wed Nov 04 06:42:26 2009, http://lj.rossia.org/users/nit/ wrote: Show quoted text

> > Ok, well let's start with the most recent version of the patch; > > important is that the code paths can't be triggered > > from Win32 and then we can feed it to CPAN testers > > to see if it's stable across platforms.

> > Attaching most recent version of patch > done against IPC::Cmd 0.50. > > I've created two different calls which > were not defined by IPC::Cmd before: > run_forked and can_use_run_forked. > > The plan is to include run_forked into your module > but leave current functionality unmodified -- > and allow people to choose which interface to use. > > NB: i've totally disabled function on VMS and Windows, > test suite returns ok(1, "platform not supported") for these. >

Hey thanks, Petya. There's a lot of code to review there, but the testcase passes, and I'll look tomorrow at integrating into the existing testsuite. I have the code in my git clone of the svn repository and I'll push a development release to CPAN when I have tests and the documentation done. Cheers.

Thu Nov 05 18:17:42 2009 BINGOS [...] cpan.org - Given to BINGOS

Mon Nov 09 08:07:41 2009 chris [...] bingosnet.co.uk - Correspondence added

CC:	petya [...] nigilist.ru
Subject:	Re: [rt.cpan.org #50398] IPC::Cmd timeout issue
Date:	Mon, 9 Nov 2009 13:07:11 +0000
To:	"http://lj.rossia.org/users/nit/ via RT" <bug-IPC-Cmd [...] rt.cpan.org>
From:	"Chris 'BinGOs' Williams" <chris [...] bingosnet.co.uk>

On Wed, Nov 04, 2009 at 06:42:29AM -0500, http://lj.rossia.org/users/nit/ via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=50398 > >

> > Ok, well let's start with the most recent version of the patch; > > important is that the code paths can't be triggered > > from Win32 and then we can feed it to CPAN testers > > to see if it's stable across platforms.

> > Attaching most recent version of patch > done against IPC::Cmd 0.50. > > I've created two different calls which > were not defined by IPC::Cmd before: > run_forked and can_use_run_forked. > > The plan is to include run_forked into your module > but leave current functionality unmodified -- > and allow people to choose which interface to use. > > NB: i've totally disabled function on VMS and Windows, > test suite returns ok(1, "platform not supported") for these.

Hi, Right, I have applied the patch, integrated the tests into the existing testsuite and released IPC-Cmd-0.51_01 to CPAN. I have also updated the IPC-Cmd that is in perl core to be 0.51_01 Before I go further and release a stable version, I need some documentation on how people would use run_forked(). Many thanks, -- Chris Williams aka BinGOs PGP ID 0x4658671F http://www.gumbynet.org.uk ==========================

Download (untitled)
application/pgp-signature 189b

Message body not shown because it is not plain text.

Thu Nov 12 16:05:11 2009 http://lj.rossia.org/users/nit/ - Correspondence added

Show quoted text

> Before I go further and release a stable version, I need some > documentation on how people would use run_forked().

Here is the documentation; if it's too long -- just take the parts you like best of all. (I feel it's a bit too long, though it explains in details how people could use the function and why they should do this.) General info. ------------- IPC::Cmd::run_forked is used to execute some program, optionally feed it with some input, get its return code and output (both stdout and stderr into seperate buffers). In addition it allows to terminate the program which take too long to finish. The important and distinguishing feature of run_forked is execution timeout which at first seems to be quite a simple task but if you think that the program which you're spawning might spawn some children itself (which in their turn could do the same and so on) it turns out to be not a simple issue. IPC::Cmd::run_forked is designed to survive and successfully terminate almost any long running task, even a fork bomb in case your system has the resources to survive during given timeout. This is achieved by creating separate watchdog process which spawns the specified program in a separate process session and supervises it: optionally feeds it with input, stores its exit code, stdout and stderr, terminates it in case it runs longer than specified. Calling syntax. --------------- The full syntax of run_forked is: my $result = IPC::Cmd::run_forked("/path/command parameters", { "child_stdin" => "some buffer with text", "timeout" => 300, # seconds "stdout_handler" => sub {print $_}, "stderr_handler" => sub {print $_}, }); The $result is the hashref with the following keys: * exit_code * stdout * stderr * timeout (0 if no timeout occured) * err_msg (something you could propagate to the user) You can use run_forked with any combination of the above options or without any options. Find below specific examples with explanations. 1.1) run without timeout (simple run): my $result = IPC::Cmd::run_forked("/path/command parameters"); After function returns you get hashref with the following keys: * exit_code -- holds the exit code of the executed command * stdout -- holds the standard output of the executed command (or empty string if there were no stdout output; it's always defined!) * stderr -- holds the standard error of the executed command (or empty string if there were no stderr output; it's always defined!) * err_msg -- holds some explanation of the results of the execution 1.2) run with timeout: my $result = IPC::Cmd::run_forked("/path/command parameters", {'timeout' => 300}); Timeout could be specified as a hashref key/value pair with key being equal to 'timeout' and value measured in seconds. If the executed program does not return for more than this number of seconds its process session is killed with SIG_KILL (9) which effectively terminates it and all of its children (direct or indirect). 1.3) synchronous processing of stdout and stderr In case you need to synchronously process stdout and stderr while run_forked is being executed you could define stdout_handler and stderr_handler callbacks, for example: sub stdout_handler { my ($buffer) = @_; } sub stderr_handler { my ($buffer) = @_; } my $result = IPC::Cmd::run_forked("/path/command params", { 'stdout_handler' => \&stdout_handler, 'stderr_handler' => \&stderr_handler, }) Now for each portion of data from stdout and stderr of executed program your functions will be called and you will get new data in $buffer. Each callback can be defined separately. Doesn't matter if you use callbacks or not you will get stdout and stderr in the corresponding key/value pairs of $result hash. 1.4) feeding input to the program To feed some input to the stdin of the executed program you use child_stdin key in the options hashref: my $result = IPC::Cmd::run_forked("cat > /tmp/message", { 'child_stdin' => 'hello buddy' }); Which will be the same as running the following command in the shell: echo "hello buddy" | cat > /tmp/message

Thu Dec 24 17:10:28 2009 http://lj.rossia.org/users/nit/ - Correspondence added

Hello Chris and people, hope you're doing well and wish you Merry Christmas and Happy New Year. Chris, thanks for integrating run_forked! I've created a patch for it with which it can run not only external commands but also internal perl code. Could you take it in please? Could we also include run_forked into SYNOPSIS part of the docs otherwise it would be hard for people to find it (I could rearrange and send you diff, if you'd like me to)? Ah! There's a small typo in the docs, run_forked(command => COMMAND, ...) should read as run_forked(COMMAND, ...) because first argument is scalar, not array/hash. Here's the patch to run internal perl code: --- Cmd.pm.orig 2009-11-16 00:43:23.000000000 +0300 +++ Cmd.pm 2009-12-25 00:59:38.000000000 +0300 @@ -810,12 +810,31 @@ close($child_stderr_socket); close($child_info_socket); - my $child_exit_code = open3_run($cmd, { - 'parent_info' => $parent_info_socket, - 'parent_stdout' => $parent_stdout_socket, - 'parent_stderr' => $parent_stderr_socket, - 'child_stdin' => $opts->{'child_stdin'}, - }); + my $child_exit_code; + + # allow both external programs + # and internal perl calls + if (!ref($cmd)) { + $child_exit_code = open3_run($cmd, { + 'parent_info' => $parent_info_socket, + 'parent_stdout' => $parent_stdout_socket, + 'parent_stderr' => $parent_stderr_socket, + 'child_stdin' => $opts->{'child_stdin'}, + }); + } + elsif (ref($cmd) eq 'CODE') { + $child_exit_code = $cmd->({ + 'opts' => $opts, + 'parent_info' => $parent_info_socket, + 'parent_stdout' => $parent_stdout_socket, + 'parent_stderr' => $parent_stderr_socket, + 'child_stdin' => $opts->{'child_stdin'}, + }); + } + else { + print $parent_stderr_socket "Invalid command reference: " . ref($cmd) . "\n"; + $child_exit_code = 1; + } close($parent_stdout_socket); close($parent_stderr_socket); Regards, Petya.

Mon Apr 12 04:04:23 2010 http://lj.rossia.org/users/nit/ - Correspondence added

Hello Chris, thanks for integrating previous patch, hope this is not too opften but here's another one. It fixes obvious typo in previous patch and adds another very useful feature which allows user to specify that he wants the run_forked code (together with the executed prog) to be automatically killed if parent (the one which started run_forked) is killed or dies without waiting for child code. To use this feature user should specify paremeter terminate_on_parent_sudden_death f.e.: my $r = IPC::Cmd::run_forked("some prog here", { 'terminate_on_parent_sudden_death' => 1} ); Here is diff -u with IPC::Cmd version 0.56: --- Cmd.pm.orig 2010-02-03 17:16:47.000000000 +0300 +++ Cmd.pm 2010-04-12 11:59:41.000000000 +0400 @@ -360,6 +360,10 @@ $wait_cycles = $wait_cycles + 1; Time::HiRes::usleep(250000); # half a second } + + if (!$child_finished) { + kill(9, $pid); + } } sub open3_run { @@ -651,7 +655,6 @@ close($parent_stderr_socket); close($parent_info_socket); - my $child_timedout = 0; my $flags; # prepare sockets to read from child @@ -673,11 +676,13 @@ # print "child $pid started\n"; + my $child_timedout = 0; my $child_finished = 0; my $child_stdout = ''; my $child_stderr = ''; my $child_merged = ''; my $child_exit_code = 0; + my $parent_died = 0; my $got_sig_child = 0; $SIG{'CHLD'} = sub { $got_sig_child = time(); }; @@ -685,9 +690,26 @@ my $child_child_pid; while (!$child_finished) { + my $now = time(); + + if ($opts->{'terminate_on_parent_sudden_death'}) { + $opts->{'runtime'}->{'last_parent_check'} = 0 + unless defined($opts->{'runtime'}->{'last_parent_check'}); + + # check for parent once each five seconds + if ($now - $opts->{'runtime'}->{'last_parent_check'} > 5) { + if (getppid() eq "1") { + kill (-9, $pid); + $parent_died = 1; + } + + $opts->{'runtime'}->{'last_parent_check'} = $now; + } + } + # user specified timeout if ($opts->{'timeout'}) { - if (time() - $start_time > $opts->{'timeout'}) { + if ($now - $start_time > $opts->{'timeout'}) { kill (-9, $pid); $child_timedout = 1; } @@ -697,7 +719,7 @@ # kill process after that and finish wait loop; # shouldn't ever happen -- remove this code? if ($got_sig_child) { - if (time() - $got_sig_child > 10) { + if ($now - $got_sig_child > 10) { print STDERR "waitpid did not return -1 for 10 seconds after SIG_CHLD, killing [$pid]\n"; kill (-9, $pid); $child_finished = 1; @@ -776,6 +798,7 @@ 'merged' => $child_merged, 'timeout' => $child_timedout ? $opts->{'timeout'} : 0, 'exit_code' => $child_exit_code, + 'parent_died' => $parent_died, }; my $err_msg = ''; @@ -785,6 +808,9 @@ if ($o->{'timeout'}) { $err_msg .= "ran more than [$o->{'timeout'}] seconds\n"; } + if ($o->{'parent_died'}) { + $err_msg .= "parent died\n"; + } if ($o->{'stdout'}) { $err_msg .= "stdout:\n" . $o->{'stdout'} . "\n"; }

Mon Apr 12 04:06:13 2010 http://lj.rossia.org/users/nit/ - Correspondence added

Show quoted text

> thanks for integrating previous patch, > hope this is not too opften but here's another one. > > It fixes obvious typo in previous patch > and adds another very useful feature

Attaching patch as a file as it is not quite readable as a comment.

Subject:

kohts-ipc-cmd-0.56-terminate_on_parent.diff

--- Cmd.pm.orig 2010-02-03 17:16:47.000000000 +0300 +++ Cmd.pm 2010-04-12 11:59:41.000000000 +0400 @@ -360,6 +360,10 @@ $wait_cycles = $wait_cycles + 1; Time::HiRes::usleep(250000); # half a second } + + if (!$child_finished) { + kill(9, $pid); + } } sub open3_run { @@ -651,7 +655,6 @@ close($parent_stderr_socket); close($parent_info_socket); - my $child_timedout = 0; my $flags; # prepare sockets to read from child @@ -673,11 +676,13 @@ # print "child $pid started\n"; + my $child_timedout = 0; my $child_finished = 0; my $child_stdout = ''; my $child_stderr = ''; my $child_merged = ''; my $child_exit_code = 0; + my $parent_died = 0; my $got_sig_child = 0; $SIG{'CHLD'} = sub { $got_sig_child = time(); }; @@ -685,9 +690,26 @@ my $child_child_pid; while (!$child_finished) { + my $now = time(); + + if ($opts->{'terminate_on_parent_sudden_death'}) { + $opts->{'runtime'}->{'last_parent_check'} = 0 + unless defined($opts->{'runtime'}->{'last_parent_check'}); + + # check for parent once each five seconds + if ($now - $opts->{'runtime'}->{'last_parent_check'} > 5) { + if (getppid() eq "1") { + kill (-9, $pid); + $parent_died = 1; + } + + $opts->{'runtime'}->{'last_parent_check'} = $now; + } + } + # user specified timeout if ($opts->{'timeout'}) { - if (time() - $start_time > $opts->{'timeout'}) { + if ($now - $start_time > $opts->{'timeout'}) { kill (-9, $pid); $child_timedout = 1; } @@ -697,7 +719,7 @@ # kill process after that and finish wait loop; # shouldn't ever happen -- remove this code? if ($got_sig_child) { - if (time() - $got_sig_child > 10) { + if ($now - $got_sig_child > 10) { print STDERR "waitpid did not return -1 for 10 seconds after SIG_CHLD, killing [$pid]\n"; kill (-9, $pid); $child_finished = 1; @@ -776,6 +798,7 @@ 'merged' => $child_merged, 'timeout' => $child_timedout ? $opts->{'timeout'} : 0, 'exit_code' => $child_exit_code, + 'parent_died' => $parent_died, }; my $err_msg = ''; @@ -785,6 +808,9 @@ if ($o->{'timeout'}) { $err_msg .= "ran more than [$o->{'timeout'}] seconds\n"; } + if ($o->{'parent_died'}) { + $err_msg .= "parent died\n"; + } if ($o->{'stdout'}) { $err_msg .= "stdout:\n" . $o->{'stdout'} . "\n"; }

Tue Jun 22 10:37:30 2010 http://lj.rossia.org/~nit/ - Correspondence added

Hello Chris, thanks again for handling patches. I've got a bit of new stuff which allows more flebixle configuration of termination of run_forked and its children. These are three parameters: 1) terminate_on_signal -- specifies that run_forked should stop child if it receives specified signal (not compatible with POSIX::SigAction, which doesn't update %SIG) 2) terminate_wait_time -- specifies how long run_forked waits after sending TERM to its children before sending KILL 3) clean_up_children -- specifies than run_forked should end all the (possible) children processes which were spawned by child, sending KILL signal to whole child process group after child is stopped. Please find attached patch. Luck, Petya.

Subject:

kohts-cmd-termination.patch

--- Cmd.pm.orig 2010-04-29 23:47:00.000000000 +0400 +++ Cmd.pm 2010-06-22 17:36:21.000000000 +0400 @@ -340,29 +340,89 @@ return $CAN_USE_RUN_FORKED eq "1"; } +# incompatible with POSIX::SigAction +# +sub install_layered_signal { + my ($s, $handler_code) = @_; + + my %available_signals = map {$_ => 1} keys %SIG; + + die("install_layered_signal got nonexistent signal name [$s]") + unless defined($available_signals{$s}); + die("install_layered_signal expects coderef") + if !ref($handler_code) || ref($handler_code) ne 'CODE'; + + my $previous_handler = $SIG{$s}; + + my $sig_handler = sub { + my ($called_sig_name, @sig_param) = @_; + + # $s is a closure refering to real signal name + # for which this handler is being installed. + # it is used to distinguish between + # real signal handlers and aliased signal handlers + my $signal_name = $s; + + # $called_sig_name is a signal name which + # was passed to this signal handler; + # it doesn't equal $signal_name in case + # some signal handlers in %SIG point + # to other signal handler (CHLD and CLD, + # ABRT and IOT) + # + # initial signal handler for aliased signal + # calles some other signal handler which + # should not execute the same handler_code again + if ($called_sig_name eq $signal_name) { + $handler_code->($signal_name); + } + + # run original signal handler if any (including aliased) + # + if (ref($previous_handler)) { + $previous_handler->($called_sig_name, @sig_param); + } + }; + + $SIG{$s} = $sig_handler; +} + # give process a chance sending TERM, # waiting for a while (2 seconds) # and killing it with KILL sub kill_gently { - my ($pid) = @_; + my ($pid, $opts) = @_; + + $opts = {} unless $opts; + $opts->{'wait_time'} = 2 unless defined($opts->{'wait_time'}); + $opts->{'first_kill_type'} = 'just_process' unless $opts->{'first_kill_type'}; + $opts->{'final_kill_type'} = 'just_process' unless $opts->{'final_kill_type'}; - kill(15, $pid); + if ($opts->{'first_kill_type'} eq 'just_process') { + kill(15, $pid); + } + elsif ($opts->{'first_kill_type'} eq 'process_group') { + kill(-15, $pid); + } - my $wait_cycles = 0; my $child_finished = 0; + my $wait_start_time = time(); - while (!$child_finished && $wait_cycles < 8) { + while (!$child_finished && $wait_start_time + $opts->{'wait_time'} > time()) { my $waitpid = waitpid($pid, WNOHANG); if ($waitpid eq -1) { $child_finished = 1; } - - $wait_cycles = $wait_cycles + 1; - Time::HiRes::usleep(250000); # half a second + Time::HiRes::usleep(250000); # quarter of a second } if (!$child_finished) { - kill(9, $pid); + if ($opts->{'final_kill_type'} eq 'just_process') { + kill(9, $pid); + } + elsif ($opts->{'final_kill_type'} eq 'process_group') { + kill(-9, $pid); + } } } @@ -454,9 +514,16 @@ } if ($got_sig_child) { - if (time() - $got_sig_child > 10) { - print STDERR "select->can_read did not return 0 for 10 seconds after SIG_CHLD, killing [$pid]\n"; - kill (-9, $pid); + if (time() - $got_sig_child > 1) { + # select->can_read doesn't return 0 after SIG_CHLD + # + # "On POSIX-compliant platforms, SIGCHLD is the signal + # sent to a process when a child process terminates." + # http://en.wikipedia.org/wiki/SIGCHLD + # + # nevertheless kill KILL wouldn't break anything here + # + kill (9, $pid); $child_finished = 1; } } @@ -491,8 +558,9 @@ waitpid($pid, 0); - # i've successfully reaped my child, - # let my parent know this + # since we've successfully reaped the child, + # let our parent know about this. + # if ($opts->{'parent_info'}) { my $ps = $opts->{'parent_info'}; print $ps "reaped $pid\n"; @@ -629,6 +697,7 @@ $opts = {} unless $opts; $opts->{'timeout'} = 0 unless $opts->{'timeout'}; + $opts->{'terminate_wait_time'} = 2 unless defined($opts->{'terminate_wait_time'}); # sockets to pass child stdout to parent my $child_stdout_socket; @@ -696,8 +765,13 @@ my $parent_died = 0; my $got_sig_child = 0; + my $got_sig_quit = 0; $SIG{'CHLD'} = sub { $got_sig_child = time(); }; + if ($opts->{'terminate_on_signal'}) { + install_layered_signal($opts->{'terminate_on_signal'}, sub { $got_sig_quit = time(); }); + } + my $child_child_pid; while (!$child_finished) { @@ -737,6 +811,15 @@ } } + if ($got_sig_quit) { + kill_gently ($pid, { + 'first_kill_type' => 'process_group', + 'final_kill_type' => 'process_group', + 'wait_time' => $opts->{'terminate_wait_time'} + }); + $child_finished = 1; + } + my $waitpid = waitpid($pid, WNOHANG); # child finished, catch it's exit status @@ -762,7 +845,7 @@ } while (my $l = <$child_stdout_socket>) { - if (!$opts->{discard_output}) { + if (!$opts->{'discard_output'}) { $child_stdout .= $l; $child_merged .= $l; } @@ -772,7 +855,7 @@ } } while (my $l = <$child_stderr_socket>) { - if (!$opts->{discard_output}) { + if (!$opts->{'discard_output'}) { $child_stderr .= $l; $child_merged .= $l; } @@ -800,6 +883,23 @@ kill_gently($child_child_pid); } + # in case there are forks in child which + # do not forward or process signals (TERM) correctly + # kill whole child process group, effectively trying + # not to return with some children or their parts still running + # + # to be more accurate -- we need to be sure + # that this is process group created by our child + # (and not some other process group with the same pgid, + # created just after death of our child) -- fortunately + # this might happen only when process group ids + # are reused quickly (there are lots of processes + # spawning new process groups for example) + # + if ($opts->{'clean_up_children'}) { + kill(-9, $pid); + } + # print "child $pid finished\n"; close($child_stdout_socket); @@ -812,7 +912,8 @@ 'merged' => $child_merged, 'timeout' => $child_timedout ? $opts->{'timeout'} : 0, 'exit_code' => $child_exit_code, - 'parent_died' => $parent_died, + 'parent_died' => $parent_died, + 'child_pgid' => $pid, }; my $err_msg = '';

Tue Oct 19 10:30:05 2010 BINGOS [...] cpan.org - Correspondence added 30 min

Thanks, patches have been applied and released. Changes for 0.60 Mon Jul 5 09:04:54 BST 2010 ================================================= * Corrected spelling mistakes in POD, spotted by H.Merijn Brand * Apply a patch from Burak Gursoy RT #58886, which fixes paths on MSWin32 * Apply patch from Petya Kohts, RT #50398, which allows more flexible configuration of run_forked and its children Changes for 0.58 Thu Apr 29 20:49:07 BST 2010 ================================================= * Applied patch from Petya Kohts, RT #50398, which adds 'terminate_on_parent_sudden_death' option to run_forked(). * Applied patches from David Morel RT #56973, which add 'discard_output' option to run_forked(). * Added documentation as suggested by Rafa<C3><AB>l Garcia-Suarez in RT #56973 Changes for 0.56 Wed Feb 3 14:17:00 GMT 2010 ================================================= * Applied patch from Petya Kohts, RT #50398, which updates run_forked() to allow coderefs Many thanks.

Tue Oct 19 10:30:06 2010 BINGOS [...] cpan.org - Status changed from 'open' to 'resolved'

Wed Apr 09 18:48:11 2014 avar [...] cpan.org - Correspondence added

On 2009-11-04T06:42:26-05:00, http://lj.rossia.org/users/nit/ wrote: Show quoted text

> > Ok, well let's start with the most recent version of the patch; > > important is that the code paths can't be triggered > > from Win32 and then we can feed it to CPAN testers > > to see if it's stable across platforms.

> > Attaching most recent version of patch > done against IPC::Cmd 0.50. > > I've created two different calls which > were not defined by IPC::Cmd before: > run_forked and can_use_run_forked. > > The plan is to include run_forked into your module > but leave current functionality unmodified -- > and allow people to choose which interface to use. > > NB: i've totally disabled function on VMS and Windows, > test suite returns ok(1, "platform not supported") for these. > > > Here's usage example: > > use IPC::Cmd; > > my $r = IPC::Cmd::run_forked("sleep 4", {'timeout' => 1}); > if ($r->{'timeout'}) { > print "command timed out\n"; > } > print $r->{'stdout'} . "\n"; > print $r->{'stderr'} . "\n"; > > Syntax is a bit different from your calls > but you could easily adapt it if you need > (though I suggest you leave it as it is, > as it saves typing). > > > Here's test suite for the function > (should be altered to use blib/: > #!/usr/bin/perl > > use strict; > use warnings; > > use Test::More 'no_plan'; > > #use lib "/root/i"; > use IPC::Cmd; > > if (!IPC::Cmd::can_use_run_forked()) { > ok(1, "run_forked not available on this platform"); > exit; > } > > my $cmd = "echo out ; echo err >&2 ; sleep 4"; > my $r = IPC::Cmd::run_forked($cmd, {'timeout' => 1}); > > ok(ref($r) eq 'HASH', "executed: $cmd"); > ok($r->{'timeout'} eq 1, "timed out"); > ok($r->{'stdout'}, "stdout: " . $r->{'stdout'}); > ok($r->{'stderr'}, "stderr: " . $r->{'stderr'});

This code had a bad bug related to signal clobbering that I fixed in https://github.com/jib/ipc-cmd/pull/13