Bug #73498 for IO-Async-Loop-Epoll: libio-async-loop-epoll-perl: FTBFS: Failed 1/8 test programs. 1/86 subtests failed.

Tue Dec 27 09:26:23 2011 CARNIL [...] cpan.org - Ticket created

From:	CARNIL [...] cpan.org
Subject:	libio-async-loop-epoll-perl: FTBFS: Failed 1/8 test programs. 1/86 subtests failed.

Hi Paul This bug has been forwarded from http://bugs.debian.org/646649 I can reproduce the issue, but cannot find a hint why it is failing. I see that are too one cpan tester failing with the same test failing [1], but there additional one further test. [1] http://www.cpantesters.org/cpan/report/6f60c70c-2bd0-11e1-be17-7e2249abe0af Source: libio-async-loop-epoll-perl Version: 0.11-1 Severity: serious Tags: wheezy sid User: debian-qa@lists.debian.org Usertags: qa-ftbfs-20111022 qa-ftbfs Justification: FTBFS on amd64 Hi, During a rebuild of all packages in sid, your package failed to build on amd64. Relevant part: Show quoted text

> debian/rules build > dh build > dh_testdir > dh_auto_configure > Creating new 'MYMETA.yml' with configuration results > Creating new 'Build' script for 'IO-Async-Loop-Epoll' version '0.11' > dh_auto_build > Building IO-Async-Loop-Epoll > dh_auto_test > t/00use.t ........... ok > t/01loop-io.t ....... ok > > # Failed test 'Other timers still fire after self-cancelling one' > # at /usr/share/perl5/IO/Async/LoopTests.pm line 458. > # got: undef > # expected: '1' > # Looks like you failed 1 test of 15. > t/02loop-timer.t .... > Dubious, test returned 1 (wstat 256, 0x100) > Failed 1/15 subtests > t/03loop-signal.t ... ok > t/04loop-idle.t ..... ok > t/05loop-child.t .... ok > t/06loop-control.t .. ok > t/99pod.t ........... ok > > Test Summary Report > ------------------- > t/02loop-timer.t (Wstat: 256 Tests: 15 Failed: 1) > Failed test: 14 > Non-zero exit status: 1 > Files=8, Tests=86, 11 wallclock secs ( 0.04 usr 0.01 sys + 0.51 cusr 0.10 csys = 0.66 CPU) > Result: FAIL > Failed 1/8 test programs. 1/86 subtests failed. > dh_auto_test: perl Build test returned exit code 255 > make: *** [build] Error 255

The full build log is available from: http://people.debian.org/~lucas/logs/2011/10/22/libio-async-loop-epoll-perl_0.11-1_lsid64.buildlog A list of current common problems and possible solutions is available at http://wiki.debian.org/qa.debian.org/FTBFS . You're welcome to contribute! About the archive rebuild: The rebuild was done on about 50 AMD64 nodes of the Grid'5000 platform, using a clean chroot. Internet was not accessible from the build systems. Thanks in advance, Salvatore Bonaccorso, Debian Perl Group

Thu Dec 29 19:15:32 2011 leonerd-cpan [...] leonerd.org.uk - Correspondence added

Hmm.. Is this a reliably repeatable failure, or only intermittent? Are you testing against latest release of IO::Async (0.45)? Perhaps would it be possible to get an strace output from the end of this test? strace -o foo.strace perl -Mblib t/02loop-timer.t -- Paul Evans

Thu Dec 29 19:15:33 2011 The RT System itself - Status changed from 'new' to 'open'

Fri Dec 30 02:45:09 2011 CARNIL [...] cpan.org - Correspondence added

Subject:	Re: [rt.cpan.org #73498] libio-async-loop-epoll-perl: FTBFS: Failed 1/8 test programs. 1/86 subtests failed.
Date:	Fri, 30 Dec 2011 08:44:56 +0100
To:	Paul Evans via RT <bug-IO-Async-Loop-Epoll [...] rt.cpan.org>
From:	Salvatore Bonaccorso <carnil [...] cpan.org>

Hey Paul many thanks for your reply on this issue: On Thu, Dec 29, 2011 at 07:15:32PM -0500, Paul Evans via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=73498 > > > Hmm.. > > Is this a reliably repeatable failure, or only intermittent?

I tried to build 20 times in a row now. Out of the 20, 4 where successful, the other failed. I cannot 100% reproduce thus. Show quoted text

> Are you testing against latest release of IO::Async (0.45)?

Yes, it is the latest 0.45 available in Debian unstable. For reference attached the build log of my latest attempt. Show quoted text

> Perhaps would it be possible to get an strace output from the end of > this test? > > strace -o foo.strace perl -Mblib t/02loop-timer.t

~/libio-async-loop-epoll-perl-0.11# strace -o foo.strace perl -Mblib t/02loop-timer.t 1..15 ok 1 - magic constructor yields $loop ok 2 - $loop has refcount 1 ok 3 - $loop has refcount 1 after enqueue_timer ok 4 - loop_once(5) while waiting for timer took at least 1.5 seconds ok 5 - loop_once(5) while waiting for timer took no more than 2.5 seconds ok 6 - cancelled timer does not fire ok 7 - $done still 0 so far ok 8 - requeued timer of delay 2 took at least 1.5 seconds ok 9 - requeued timer of delay 2 took no more than 2.5 seconds ok 10 - $done is 2 after requeued timer ok 11 - loop_once while waiting for negative interval timer took at least 0 seconds ok 12 - loop_once while waiting for negative interval timer took no more than 0.1 seconds ok 13 - Self-cancelling timer still fires ok 14 - Other timers still fire after self-cancelling one ok 15 - $loop has refcount 1 finally ~/libio-async-loop-epoll-perl-0.11# perl -Mblib t/02loop-timer.t 1..15 ok 1 - magic constructor yields $loop ok 2 - $loop has refcount 1 ok 3 - $loop has refcount 1 after enqueue_timer ok 4 - loop_once(5) while waiting for timer took at least 1.5 seconds ok 5 - loop_once(5) while waiting for timer took no more than 2.5 seconds ok 6 - cancelled timer does not fire ok 7 - $done still 0 so far ok 8 - requeued timer of delay 2 took at least 1.5 seconds ok 9 - requeued timer of delay 2 took no more than 2.5 seconds ok 10 - $done is 2 after requeued timer ok 11 - loop_once while waiting for negative interval timer took at least 0 seconds ok 12 - loop_once while waiting for negative interval timer took no more than 0.1 seconds ok 13 - Self-cancelling timer still fires not ok 14 - Other timers still fire after self-cancelling one # Failed test 'Other timers still fire after self-cancelling one' # at /usr/share/perl5/IO/Async/LoopTests.pm line 458. # got: undef # expected: '1' ok 15 - $loop has refcount 1 finally # Looks like you failed 1 test of 15. so it does not fail when run under strace and cannot give this further information. Please let me know if I can provide some more information. Regards Salvatore

Download libio-async-loop-epoll-perl_0.11-1-amd64-20111230-0731.gz
application/octet-stream 4.5k

Message body not shown because it is not plain text.

Download signature.asc
application/pgp-signature 836b

Message body not shown because it is not plain text.

Sun Apr 01 08:30:13 2012 ntyni [...] iki.fi - Correspondence added

From:

ntyni [...] iki.fi

On Tue Dec 27 09:26:23 2011, CARNIL wrote: Show quoted text

> > # Failed test 'Other timers still fire after self-cancelling one' > > # at /usr/share/perl5/IO/Async/LoopTests.pm line 458. > > # got: undef > > # expected: '1' > > # Looks like you failed 1 test of 15. > > t/02loop-timer.t .... > > Dubious, test returned 1 (wstat 256, 0x100) > > Failed 1/15 subtests

I see there are a dozen similar similar failures now on the CPAN tester reports so it's not just us. I looked into this, and the bug seems to be in timeouts rounding down to the nearest full millisecond. See the commit messages of the attached proposed patches. Hope they make the problem clear enough, let me know if you need more information. As the failures are easily reproducible for me, I can also easily provide straces if you like. Just let me know (but note that I'm currently not in the CC field of this ticket; feel free to add me though.) Many thanks for your work, -- Niko Tyni (Debian Perl Group) ntyni@debian.org

Subject:

0002-Fix-loop_once-returning-too-soon.patch

From eab695afa987bde81db22f0dc7f689677cef6794 Mon Sep 17 00:00:00 2001 From: Niko Tyni <ntyni@debian.org> Date: Sun, 1 Apr 2012 15:16:04 +0300 Subject: [PATCH 2/2] Fix loop_once() returning too soon epoll_pwait() uses an integer millisecond value for its timeout parameter, so round up any fractional values. This avoids early returns during the very last millisecond before the next scheduled event. As detailed in the previous commit message, this should fix the intermittent t/02loop-timer.t failures seen in [rt.cpan.org #73498]. --- lib/IO/Async/Loop/Epoll.pm | 4 ++-- t/07loop-once.t | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/lib/IO/Async/Loop/Epoll.pm b/lib/IO/Async/Loop/Epoll.pm index 9f577c3..abb818d 100644 --- a/lib/IO/Async/Loop/Epoll.pm +++ b/lib/IO/Async/Loop/Epoll.pm @@ -22,7 +22,7 @@ use IO::Epoll qw( EPOLLIN EPOLLOUT EPOLLHUP EPOLLERR ); -use POSIX qw( EINTR EPERM SIG_BLOCK SIG_UNBLOCK sigprocmask sigaction ); +use POSIX qw( EINTR EPERM SIG_BLOCK SIG_UNBLOCK sigprocmask sigaction ceil ); =head1 NAME @@ -143,7 +143,7 @@ sub loop_once $self->_adjust_timeout( \$timeout ); - my $msec = defined $timeout ? $timeout * 1000 : -1; + my $msec = defined $timeout ? ceil($timeout * 1000) : -1; my $ret = epoll_pwait( $self->{epoll}, $self->{maxevents}, $msec, $self->{sigmask} ); diff --git a/t/07loop-once.t b/t/07loop-once.t index 2cad2e3..5b46109 100644 --- a/t/07loop-once.t +++ b/t/07loop-once.t @@ -19,9 +19,7 @@ for my $delay (@delays) { $i++ while !$loop->loop_once( 1 ) # callback not triggered yet && (time() - $starttime < 1); # guard against infinite loops - TODO: { local $::TODO = "[rt.cpan.org #73498]"; is ($i, 1, "Just one loop run is enough to wait $delay seconds"); - } is ($done, 1, "Timer actually got triggered within one second"); } -- 1.7.9.1

Subject:

0001-TODO-tests-to-detect-loop_once-returning-too-soon.patch

From 9a6a6e60897777a4c220890ba709eb5ea3572418 Mon Sep 17 00:00:00 2001 From: Niko Tyni <ntyni@debian.org> Date: Sun, 1 Apr 2012 14:10:00 +0300 Subject: [PATCH 1/2] TODO tests to detect loop_once() returning too soon The intermittent t/02loop-timer.t failures seen in [rt.cpan.org #73498] happen because epoll_pwait() uses an integer millisecond value for its timeout parameter, so fractional values will get rounded down. The loop_once() method of IO::Async::Loop::Epoll doesn't take this into account, so the next scheduled callback may therefore well not get triggered on just one loop_once() call. If control comes back to the calling code before the remaining millisecond is spent, the next loop_once() call will end up calling epoll_pwait() with a timeout of zero, which makes it return immediately. This may result in a busy loop of several loop_once() calls until the millisecond tick actually happens. Add (hopefully) more deterministic TODO tests to detect such situations. As IO::Async::Loop will never sleep more than one second ($MAX_SIGWAIT_TIME), we test with a range of timeouts between one and 700 milliseconds that loop_once() will only return when a callback has actually been called. The busy loop seems to be longer on the small timeouts - perhaps epoll_pwait(2) uses a finer resolution for shorter timeouts than the longer ones? --- MANIFEST | 1 + t/07loop-once.t | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 0 deletions(-) create mode 100644 t/07loop-once.t diff --git a/MANIFEST b/MANIFEST index 0010c9e..fc55176 100644 --- a/MANIFEST +++ b/MANIFEST @@ -14,4 +14,5 @@ t/03loop-signal.t t/04loop-idle.t t/05loop-child.t t/06loop-control.t +t/07loop-once.t t/99pod.t diff --git a/t/07loop-once.t b/t/07loop-once.t new file mode 100644 index 0000000..2cad2e3 --- /dev/null +++ b/t/07loop-once.t @@ -0,0 +1,27 @@ +#!/usr/bin/perl -w +my @delays; + +BEGIN { @delays = (.001, .005, .007, .01, .05, .07, .1, .5, .7) } + +use strict; +use Test::More tests => 2 * @delays; +use Time::HiRes q/time/; + +use IO::Async::Loop::Epoll; +my $loop = IO::Async::Loop::Epoll->new; + +for my $delay (@delays) { + my $done; + my $i = 1; + my $starttime = time(); + + $loop->enqueue_timer( delay => $delay, code => sub { $done++ } ); + $i++ while + !$loop->loop_once( 1 ) # callback not triggered yet + && (time() - $starttime < 1); # guard against infinite loops + TODO: { local $::TODO = "[rt.cpan.org #73498]"; + is ($i, 1, "Just one loop run is enough to wait $delay seconds"); + } + is ($done, 1, "Timer actually got triggered within one second"); +} + -- 1.7.9.1

Sun Apr 01 09:02:39 2012 leonerd-cpan [...] leonerd.org.uk - Correspondence added

RT-Send-CC:

ntyni [...] iki.fi

On Sun Apr 01 08:30:13 2012, ntyni@iki.fi wrote: Show quoted text

> I looked into this, and the bug seems to be in timeouts rounding down > to the nearest full millisecond. See the commit messages of the attached > proposed patches. Hope they make the problem clear enough, let me know > if you need more information. > > As the failures are easily reproducible for me, I can also easily > provide straces if you like. Just let me know (but note that I'm > currently not in the CC field of this ticket; feel free to add me > though.)

Ahh yes, that sounds quite plausible. Hrm.. I'll have a go at popping that in, with those expanded tests as well. -- Paul Evans

Wed Apr 04 13:19:12 2012 leonerd-cpan [...] leonerd.org.uk - Fixed in 0.12 added

Wed Apr 04 13:19:32 2012 leonerd-cpan [...] leonerd.org.uk - Correspondence added

Released as 0.12. -- Paul Evans

Wed Apr 04 13:19:33 2012 leonerd-cpan [...] leonerd.org.uk - Status changed from 'open' to 'resolved'