Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Test-TCP CPAN distribution.

Report information
The Basics
Id: 66016
Status: open
Priority: 0/
Queue: Test-TCP

People
Owner: Nobody in particular
Requestors: mkanat [...] cpan.org
Cc: JDB [...] cpan.org
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.11
Fixed in: (no value)



Subject: Tests fail on Windows (even when they pass?)
It looks like the tests for Test-TCP frequently end with "Dubious, test returned 9 (wstat 2304, 0x900)" even when they have passed, on Windows: http://ppm4.activestate.com/MSWin32-x86/5.12/1200/T/TO/TOKUHIROM/Test- TCP-1.11.d/log-20101221T221845.txt This looks to be the current blocker for getting Plack to be installable on Windows via PPM.
From: bo.johansson [...] lsn.se
Vid Mon, 21 Feb 2011 kl. 21.09.29, skrev MKANAT: Show quoted text
> It looks like the tests for Test-TCP frequently end with "Dubious, test > returned 9 (wstat 2304, 0x900)" even when they have passed, on Windows: > > http://ppm4.activestate.com/MSWin32-x86/5.12/1200/T/TO/TOKUHIROM/Test- > TCP-1.11.d/log-20101221T221845.txt > > This looks to be the current blocker for getting Plack to be installable > on Windows via PPM.
Had also problems with the test t/08_exit.t. It is blocking on Windows 7. Trying to find the problem using warn statements, I found it seemed to work much better. Added a sleep statement in t/08_exit.t (marked below) and the test run much better. Is it a timing problem? else { # child test_tcp( client => sub { my $port = shift; note "CLIENT: $$"; sleep 1; <======== exit 1; },
Subject: Re: [rt.cpan.org #66016] Tests fail on Windows (even when they pass?)
Date: Thu, 3 Mar 2011 12:38:23 +0900
To: bug-Test-TCP [...] rt.cpan.org
From: Tokuhiro Matsuno <tokuhirom [...] gmail.com>
This is a issue around leaking $?. But Test::TCP already added "local $?" in process management. It's maybe ActivePerl's issue. On Tue, Feb 22, 2011 at 11:09 AM, Max Kanat-Alexander via RT <bug-Test-TCP@rt.cpan.org> wrote: Show quoted text
> Mon Feb 21 21:09:29 2011: Request 66016 was acted upon. > Transaction: Ticket created by MKANAT >       Queue: Test-TCP >     Subject: Tests fail on Windows (even when they pass?) >   Broken in: 1.11 >    Severity: Critical >       Owner: Nobody >  Requestors: mkanat@cpan.org >      Status: new >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=66016 > > > > It looks like the tests for Test-TCP frequently end with "Dubious, test > returned 9 (wstat 2304, 0x900)" even when they have passed, on Windows: > > http://ppm4.activestate.com/MSWin32-x86/5.12/1200/T/TO/TOKUHIROM/Test- > TCP-1.11.d/log-20101221T221845.txt > > This looks to be the current blocker for getting Plack to be installable > on Windows via PPM. >
RT-Send-CC: tokuhirom [...] gmail.com, bo.johansson [...] lsn.se
On Wed Mar 02 22:38:33 2011, tokuhirom@gmail.com wrote: Show quoted text
> This is a issue around leaking $?. But Test::TCP already added "local > $?" in process management. It's maybe ActivePerl's issue.
I can reproduce the problem with the latest bleadperl, compiled with Microsoft VC6 on Windows 2000. The problem is *not* a leaking $?. I have verified that the return value of the Perl main() function is actually correct (0). However, the waitpid() function in the parent does indeed sometimes receive a different value (9). This only happens when the code is running under Test::Harness. I have further simplified the setup to trigger the issue to this driver program, using IPC::Open3 directly (the same way TAP::Parser does): #!perl use strict; use warnings; use IPC::Open3; use IO::Handle; $ENV{PERL5LIB} = "blib/lib"; my $out = IO::Handle->new; my $pid = open3("<&STDIN", $out, ">&STDERR", @ARGV); while () { last unless defined( my $line = <$out> ); print "# $line"; } my $wpid = waitpid( $pid, 0 ); print <<EOT; pid=$pid wpid=$wpid \$?=$? EOT __END__ Run with `perl tst.pl perl -w t/04_die.t` The problem goes away if instead of the $out filehandle I simply pass ">&STDOUT" (and remove the loop reading from $out). I've starred for a long time at the win32_waitpid() implementation, and can't find any fault there. I also did lots more experiments, calling ExitProcess() directly to bypass CRT cleanup etc, but the issue never went away. So I have no idea where the code lives, that seems to be changing the exit status post-mortem. It does feel like a bug in either Windows, or the MSVCRT runtime library. However, I did find a workaround to get all the tests to pass: adding a simple sleep(1) call to the end of Test::TCP::stop() makes all tests pass; the $? fiddling done by the latest release is not necessary. So I would recommend adding something like this: sleep(1) if $^O eq "MSWin32"; and forget about it, even if we don't manage to fully understand it.
On Tue Mar 08 21:03:54 2011, JDB wrote: Show quoted text
> So I would recommend adding something like this: > > sleep(1) if $^O eq "MSWin32"; > > and forget about it, even if we don't manage to fully understand it.
I added this patch via the distroprefs mechanism to the ActiveState PPM build machines, and all Windows builds succeeded now: http://code.activestate.com/ppm/Test-TCP/ Sample build log: http://ppm4.activestate.com/MSWin32-x86/5.12/1200/T/TO/TOKUHIROM/Test-TCP-1.12.d/log-20110308T181913.txt
From: bo.johansson [...] lsn.se
Vid ons, 09 Mar 2011 kl. 13.59.58, skrev JDB: Show quoted text
> On Tue Mar 08 21:03:54 2011, JDB wrote:
> > So I would recommend adding something like this: > > > > sleep(1) if $^O eq "MSWin32"; > > > > and forget about it, even if we don't manage to fully understand it.
> > I added this patch via the distroprefs mechanism to the ActiveState > PPM > build machines, and all Windows builds succeeded now: > > http://code.activestate.com/ppm/Test-TCP/ > > Sample build log: > > http://ppm4.activestate.com/MSWin32-x86/5.12/1200/T/TO/TOKUHIROM/Test- > TCP-1.12.d/log-20110308T181913.txt
I have used the proposed change and most of the tests are still blocking. The test 08_exit.t is blocking about one of three times. The tests 01_simple.t t/09_fork.t t/10_oo.t are blocking with lower frequency of about one of 60 times. I am using: Strawberry-perl-5.12.2.0 Perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x86-multi-thread Windows 7 Home Premium with Service Pack 1 See also https://rt.cpan.org/Public/Bug/Display.html?id=66437
RT-Send-CC: tokuhirom [...] gmail.com, bo.johansson [...] lsn.se
Alright, I think I've figured it all out now... Here are 2 minimal test cases for ActivePerl that reproduce the '9' exit code: Windows 2000: system $^X, "-e", "if (\$pid=fork){sleep 1;kill(9, \$pid)} else {sleep 5}"; print $?>>8, "\n"; Windows 7: system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; print $?>>8, "\n"; The only difference is the additional sleep() before kill() on Windows 2000. The Windows 7 test case however prints '0' on Windows 2000, and vice versa, the Windows 2000 test case prints '0' on Windows 7 too. The problem does not manifest itself under Strawberry Perl 5.12.1 on Windows 7 (and Strawberry doesn't install on Windows 2000). The problem can be fixed in both cases by adding a sleep(1) after the kill() call. Further testing shows that it is actually enough to just give up the remainder of the timeslice to resolve the issue: Win32::Sleep(0); Looking at the implementation of win32_kill() again I see that we are actually passing the signal number as the thread exit code: case 9: /* kill -9 style un-graceful exit */ if (TerminateThread(hProcess, sig)) { remove_dead_pseudo_process(child); return 0; } So what this now looks like is that we are killing the other thread with an exit code of 9, and if we manage to call ExitProcess(0) in the remainder of our own timeslice, then the Windows scheduler will do the final cleanup of the child thread after we have terminated the main thread, replacing our exit code (0) with the exit code of the child process (9). I haven't tested this further, but I assume that Strawberry Perl, due to being compiled with a different compiler, will do something that always yields to the scheduler before calling ExitProcess(), thereby avoiding this race condition. I'm going to fix this in core Perl by adding a Sleep(0) call after the TerminateThread() call for the Perl 5.14 release. But for earlier releases of ActivePerl it would still be nice if Test-TCP could add a call to sleep(1) to prevent these bogus test failures (both for itself, and the modules using it, like Plack). If you want to avoid the additional delay for each test, change the sleep(1) to: Win32::Sleep(0) if $^O eq "MSWin32"; Win32::Sleep() is always defined on Windows; there is no need to use/require the Win32 module first. And don't try sleep(0), it is implemented differently and won't give up the remainder of the timeslice. As a further note, since this problem affects the tests of other modules too (e.g. HTTP::Server::Simple), a good workaround for current ActivePerl versions is to add this line: END { Win32::Sleep(0) } to your Perl/site/lib/sitecustomize.pl file. Both Test-TCP-1.12 and HTTP-Server-Simple-0.43 build and test without problems for me with this modification. I'll add this workaround to the ActiveState PPM Windows build machines.
On Thu Mar 10 18:54:38 2011, JDB wrote: Show quoted text
> I'm going to fix this in core Perl by adding a Sleep(0) call after the > TerminateThread() call for the Perl 5.14 release.
http://perl5.git.perl.org/perl.git/commitdiff/82e24582
From: bo.johansson [...] lsn.se
I have done some test using Win32::Sleep(0); and some inspired by "if (\$pid=fork){sleep 1;kill(9, \$pid)} else {sleep 5}". I have also tried to find a improved workaround using Win32::Sleep(0); in the tests of Test::TCP. See 3 below. A workaround which only changes the stop method in TCP.pm seems not to be enough. The in https://rt.cpan.org/Public/Bug/Display.html?id=66437 proposed changes to the tests is probably also necessary. Here are the results. Hope it can give some information in the search of the problem causes. 1) This is always blocking: perl -e "if ($pid=fork){kill(9, $pid);} else {sleep 1000}" (Should the corresponding test be added to the tests of the core Perl) 2) This is sometimes blocking and sometimes giving the exit value 9: perl -e "if ($pid=fork){Win32::Sleep(0); kill(9, $pid);} else {sleep 1000}" Using the batch file below I get the following result: - Blocking after 725 loops. - Blocking after 1335 loops. - Exiting with the value 9 after 568 loops. - Exiting with the value 9 after 5 loops.* - Exiting with the value 9 after 3 loops.* - Exiting with the value 9 after 14 loops.* - Exiting with the value 9 after 6 loops.* - Blocking after 3878 loops. - Blocking after 269 loops. - Exiting with the value 9 after 678 loops. - Blocking after 1472 loops. * this was run in sequence in one cmd.exe window. Is there a problem with some initialization of variables??!! Batchflie: @echo off set count=0 :loop set /a count=%count%+1 echo Count %count% @echo on perl -e "if ($pid=fork){Win32::Sleep(0); kill(9, $pid);} else {sleep 1000}" if errorlevel 1 goto exit @echo off goto loop :exit ECHO.%ERRORLEVEL% 3) Running tests with modifications in TCP.pm Got the following result running some test, using a batch file and with modifications in TCP.pm: Test 01_simple.t is blocking after 343 loops Test 08_exit.t is blocking after 713 loops Test 10_oo.t is blocking after 481 loops Test 01_simple.t is blocking after 423 loops Test 10_oo.t is blocking after 2067 loops Test 08_exit.t is blocking after 1434 loops Test 08_exit.t is blocking after 87 loops Test 08_exit.t is blocking after 3258 loops Used batch file: @echo off set count=0 :loop set /a count=%count%+1 echo Count %count% @echo on perl -Ilib t/01_simple.t perl -Ilib t/08_exit.t perl -Ilib t/09_fork.t perl -Ilib t/10_oo.t @echo off goto loop With the two marked lines added in TCP.pm: sub stop { my $self = shift; return unless defined $self->{pid}; return unless $self->{_my_pid} == $$; Win32::Sleep(0) if $^O eq "MSWin32"; # added line kill $TERMSIG => $self->{pid}; Win32::Sleep(0) if $^O eq "MSWin32"; # added line local $?; # waitpid modifies original $?. I am using: - Strawberry-perl-5.12.2.0 - Perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x86-multi-thread - Windows 7 Home Premium with Service Pack 1 See also https://rt.cpan.org/Public/Bug/Display.html?id=66437
CC: <JDB [...] cpan.org>
Subject: RE: [rt.cpan.org #66016] Tests fail on Windows (even when they pass?)
Date: Mon, 14 Mar 2011 10:14:12 -0700
To: <bug-Test-TCP [...] rt.cpan.org>, <mkanat [...] cpan.org>
From: "Jan Dubois" <jand [...] activestate.com>
On Mon, 14 Mar 2011, Bo Johansson via RT wrote: Show quoted text
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=66016 > > > I have done some test using Win32::Sleep(0); and some inspired by "if > (\$pid=fork){sleep 1;kill(9, \$pid)} else {sleep 5}". > > I have also tried to find a improved workaround using Win32::Sleep(0); > in the tests of Test::TCP. See 3 below. > A workaround which only changes the stop method in TCP.pm seems not to > be enough. The in https://rt.cpan.org/Public/Bug/Display.html?id=66437 > proposed changes to the tests is probably also necessary. > > Here are the results. Hope it can give some information in the search of > the problem causes. > > 1) This is always blocking: > perl -e "if ($pid=fork){kill(9, $pid);} else {sleep 1000}"
On Windows 7, but not on Windows 2000. It is a race condition, and the timing seems to depend on details of the scheduler. Show quoted text
> (Should the corresponding test be added to the tests of the core Perl)
I already did this (together with adding a Sleep(0) after calling TerminateThread() whenever a pseudo-process is killed with kill(9,...)): http://perl5.git.perl.org/perl.git/commitdiff/82e24582 Show quoted text
> 2) This is sometimes blocking and sometimes giving the exit value 9: > perl -e "if ($pid=fork){Win32::Sleep(0); kill(9, $pid);} else {sleep 1000}"
Yes, to prevent the exit value of 9 the Win32::Sleep(0) has to be *after* the kill(). [...] Show quoted text
> 3) Running tests with modifications in TCP.pm > > Got the following result running some test, using a batch file and with > modifications in TCP.pm:
[...] Show quoted text
> With the two marked lines added in TCP.pm: > > sub stop { > my $self = shift; > > return unless defined $self->{pid}; > return unless $self->{_my_pid} == $$; > Win32::Sleep(0) if $^O eq "MSWin32"; # added line
That line should not be necessary (at least not for the exit value of '9' problem. Show quoted text
> kill $TERMSIG => $self->{pid}; > Win32::Sleep(0) if $^O eq "MSWin32"; # added line > local $?; # waitpid modifies original $?. > > I am using: > - Strawberry-perl-5.12.2.0 > - Perl 5, version 12, subversion 2 (v5.12.2) built for > MSWin32-x86-multi-thread > - Windows 7 Home Premium with Service Pack 1
Thanks for providing the batch file. I'll try to run it on my VMs to see if I can reproduce the blocking. Cheers, -Jan
From: bo.johansson [...] lsn.se
Hej Jan! Thank you for your patience! I have very little knowledge about the internal of perl. 1) You wrote: Show quoted text
>Windows 7: >system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; >print $?>>8, "\n";
Is it a problem if it is blocking and there is no print at all? 2) Using the batch file: @echo off set count=0 :loop set /a count=%count%+1 echo Count %count% @echo on perl -Ilib t/01_simple.t perl -Ilib t/08_exit.t perl -Ilib t/09_fork.t perl -Ilib t/10_oo.t @echo off goto loop and with "added line removed" in: sub stop { my $self = shift; return unless defined $self->{pid}; return unless $self->{_my_pid} == $$; #Win32::Sleep(0) if $^O eq "MSWin32"; # added line removed kill $TERMSIG => $self->{pid}; Win32::Sleep(0) if $^O eq "MSWin32"; # added line local $?; # waitpid modifies original $?. gave the results: - Test 09_fork.t is blocking in loop 1. - Test 09_fork.t is blocking in loop 4. - Test 09_fork.t is blocking in loop 2. - Test 09_fork.t is blocking in loop 2. - Test 09_fork.t is blocking in loop 1. - Test 09_fork.t is blocking in loop 1. - Test 09_fork.t is blocking in loop 2. Removed "perl -Ilib t/09_fork.t" in batch file. - Test 08_exit.t is blocking in loop 8. - Test 08_exit.t is blocking in loop 9. - Test 08_exit.t is blocking in loop 13. - Test 08_exit.t is blocking in loop 5. Conclusion: This indicates that the first Win32::Sleep(0) if $^O eq "MSWin32"; line is needed to prevent the blocking. 3) Running this batch file: @echo off set count=0 :loop set /a count=%count%+1 echo Count %count% @echo on perl -e "if ($pid=fork){Win32::Sleep(0); kill(9, $pid); Win32::Sleep(0)} else {sleep 1000}" if errorlevel 1 goto exit @echo off goto loop :exit ECHO.%ERRORLEVEL% (second "Win32::Sleep(0)" is added) gave the results: - Blocking after 2170 loops. - Blocking after 449 loops. - Blocking after 38 loops. - Blocking after 58 loops. - Blocking after 2108 loops. - Exiting with the value 9 after 3 loops. - Exiting with the value 9 after 7 loops. - Exiting with the value 9 after 7 loops. - Exiting with the value 9 after 1 loops. - Exiting with the value 9 after a few loops was repeated more than 20 times using the same command windows. Is there some missing initialization???!!! - Blocking after 2806 loops. - Blocking after 353 loops. - Blocking after 1737 loops. - Run more than 7000 loops without problems. Conclusion: There are still "exiting with the value 9". Removing the first Win32::Sleep(0) resulted in immediate blocking. Have a nice day! Bosse
CC: <JDB [...] cpan.org>
Subject: RE: [rt.cpan.org #66016] Tests fail on Windows (even when they pass?)
Date: Mon, 14 Mar 2011 17:20:56 -0700
To: <bug-Test-TCP [...] rt.cpan.org>, <mkanat [...] cpan.org>
From: "Jan Dubois" <jand [...] activestate.com>
On Mon, 14 Mar 2011, Bo Johansson via RT wrote: Show quoted text
> 1) You wrote: >
> >Windows 7: > >system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; > >print $?>>8, "\n";
> > Is it a problem if it is blocking and there is no print at all?
Yes, that is a problem. Note though that kill(9, $child) is inherently unsafe for pseudo-processes. You should only ever do this if you know that $child is in a known "blocking" state. E.g. system $^X, "-e", "if (\$pid=fork){sleep 1;kill(9, \$pid)} else {sleep 5}"; The sleep(1) before the kill() should give the child process enough time to reach the sleep(5) call before getting killed. Here is the list of warnings about some of the things that can go wrong if you call TerminateThread() when you don't know the current state of the thread (from the Microsoft docs): * If the target thread owns a critical section, the critical section will not be released. * If the target thread is allocating memory from the heap, the heap lock will not be released. * If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent. * If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL. As the terminated thread runs inside the same process as the controlling thread, it is always possible that the controlling thread will get damaged by this action. I've suggested an alternative to perl5-porters, that would allow us to SIGTERM the child and then simply exit from the parent and leave the thread cleanup to the OS. That should work much cleaner, because it should never affect the parent process. Show quoted text
> 2) Using the batch file: > @echo off > set count=0 > :loop > set /a count=%count%+1 > echo Count %count% > @echo on > perl -Ilib t/01_simple.t > perl -Ilib t/08_exit.t > perl -Ilib t/09_fork.t > perl -Ilib t/10_oo.t > @echo off > goto loop
I have not tested with the Test::TCP tests any further, but... [...] Show quoted text
> 3) Running this batch file: > > @echo off > set count=0 > :loop > set /a count=%count%+1 > echo Count %count% > @echo on > perl -e "if ($pid=fork){Win32::Sleep(0); kill(9, $pid); Win32::Sleep(0)} > else {sleep 1000}" > if errorlevel 1 goto exit > @echo off > goto loop > :exit > ECHO.%ERRORLEVEL% > > (second "Win32::Sleep(0)" is added) gave the results: > > - Blocking after 2170 loops. > - Blocking after 449 loops. > - Blocking after 38 loops. > - Blocking after 58 loops. > - Blocking after 2108 loops. > - Exiting with the value 9 after 3 loops. > - Exiting with the value 9 after 7 loops. > - Exiting with the value 9 after 7 loops. > - Exiting with the value 9 after 1 loops. > - Exiting with the value 9 after a few loops was repeated more than 20 > times using the same command windows. > Is there some missing initialization???!!! > - Blocking after 2806 loops. > - Blocking after 353 loops. > - Blocking after 1737 loops. > - Run more than 7000 loops without problems. > > Conclusion: There are still "exiting with the value 9". > > Removing the first Win32::Sleep(0) resulted in immediate blocking.
I've been running this, and variations of it on my Windows 7 Pro SP1 VM, and only had it exit with value "9" once after more than 24K interations. So I see that it can still happen, but it is really hard to repro. I was hoping to be able to see the blocking state, so I could use a debugger to check the stack traces from all threads to determine the cause of the deadlock, but have not been able to repo this at all. Anyways, I don't think TerminateThread() can ever be completely safe if you call it when the child thread may not be in a wait state, so I hope the alternate implementation will solve the issue instead (for 5.14 and later only, for 5.12 and earlier we will have to add the sleep() calls and live with the occasional breakage, I think). Cheers, -Jan
op/fork.t test if failing on my Windows 7/64bit box
- perl-5.14.0-RC1
- built with mingw-w64.sf.net gcc-4.4.3 toolchain
- the same failure for 32bit and 64bit perl

op/fork.t ......................................................... 23/? PROG:
# Windows 7: https://rt.cpan.org/Ticket/Display.html?id=66016#txn-908976
system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}";
print $?>>8, "\n";
EXPECTED:
0
GOT:
9 -- kmx
Subject: Re: [rt.cpan.org #66016] Tests fail on Windows (even when they pass?)
Date: Wed, 27 Apr 2011 11:12:54 +0900
To: bug-Test-TCP [...] rt.cpan.org
From: Tokuhiro Matsuno <tokuhirom [...] gmail.com>
Can anyone send a pull-req on github? repo is here: https://github.com/tokuhirom/test-tcp On Wed, Apr 27, 2011 at 1:23 AM, kmx via RT <bug-Test-TCP@rt.cpan.org> wrote: Show quoted text
>       Queue: Test-TCP >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=66016 > > > op/fork.t test if failing on my Windows 7/64bit box > - perl-5.14.0-RC1 > - built with mingw-w64.sf.net gcc-4.4.3 toolchain > - the same failure for 32bit and 64bit perl > > > op/fork.t ......................................................... 23/? PROG: > # Windows 7: https://rt.cpan.org/Ticket/Display.html?id=66016#txn-908976 > system $^X,  "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; > print $?>>8, "\n"; > EXPECTED: > 0 > GOT: > 9 > -- > kmx > >
From: bo.johansson [...] lsn.se
Vid Fre, 11 Mar 2011 kl. 13.43.42, skrev JDB: Show quoted text
> On Thu Mar 10 18:54:38 2011, JDB wrote:
> > I'm going to fix this in core Perl by adding a Sleep(0) call after the > > TerminateThread() call for the Perl 5.14 release.
> > http://perl5.git.perl.org/perl.git/commitdiff/82e24582
I have got an improvement proposal to the patch see http://www.perlmonks.org/?node_id=1022657. I have no possibilities to test the proposal. Could someone test it and if it work create a patch?