Bug #102145 for Dancer: test port selection race condition

Mon Feb 16 06:35:17 2015 zefram [...] fysh.org - Ticket created

Subject:	test port selection race condition
Date:	Mon, 16 Feb 2015 11:35:00 +0000
To:	bug-Dancer [...] rt.cpan.org
From:	Zefram <zefram [...] fysh.org>

Dancer's tests select TCP ports in a fundamentally broken manner. Via Test::TCP and Net::EmptyPort, they select a port that appears currently unused, and then when opening the server socket they demand to bind it to that particular port. This is subject to an obvious race condition: the port may be taken by another process between its selection and the attempt to bind to it. For example: bind to 0.0.0.0:50890: Address already in use at /opt/perl-5.18.2/lib/site_perl/ 5.18.2/HTTP/Server/Simple.pm line 732, <DATA> line 16. HTTP::Server::Simple::setup_listener(Dancer::Handler::Standalone=HASH(0xf281c0)) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/HTTP/Server/Simple.pm line 299 HTTP::Server::Simple::run(Dancer::Handler::Standalone=HASH(0xf281c0)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer/Handler/Standalone.pm line 39 Dancer::Handler::Standalone::start(Dancer::Handler::Standalone=HASH(0x154b078)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer/Handler.pm line 212 Dancer::Handler::dance(Dancer::Handler::Standalone=HASH(0x154b078)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer.pm line 494 Dancer::_start("Dancer") called at t/03_route_handler/34_forward_body_post.t line 46 main::__ANON__(50890) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 87 Test::TCP::start(Test::TCP=HASH(0x14cbd30)) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 69 Test::TCP::new("Test::TCP", "code", CODE(0x18166a0), "port", 50890) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 31 Test::TCP::test_tcp("client", CODE(0x8ee2f0), "server", CODE(0x18166a0)) called at t/03_route_handler/34_forward_body_post.t line 48 I have an automatic daily build of Perl and a bunch of CPAN modules, and if one module fails then it breaks that day's build, so intermittent false failures matter. That's the context in which the above occurred. The tests should instead leave port selection to the kernel. Bind the socket without specifying a port in the system call, and then find out what port it's actually bound to. Obviously, an additional pipe will be required to report the port from the server process back to the main test process. Or alternatively you could detect the EADDRINUSE failure and retry the test when that occurs. -zefram

Wed Feb 17 17:58:52 2016 davidp [...] preshweb.co.uk - Correspondence added

I've been working on a hackish half-fix to this in https://github.com/PerlDancer/Dancer/pull/1150 - binding to 127.0.0.10 instead of 127.0.0.1 which is always going to be more in-demand - but it fails tests on FreeBSD boxes, as they apparently don't know that the whole of 127/8 is the loopback range. Hmm. I think the "proper" fix is a rather large amount of work, particularly for D1 which is in maintenance mode these days. Do you still see this causing issues?

Wed Feb 17 17:58:53 2016 The RT System itself - Status changed from 'new' to 'open'

Thu Feb 18 02:02:44 2016 zefram [...] fysh.org - Correspondence added

Subject:	Re: [rt.cpan.org #102145] test port selection race condition
Date:	Thu, 18 Feb 2016 07:02:30 +0000
To:	David Precious via RT <bug-Dancer [...] rt.cpan.org>
From:	Zefram <zefram [...] fysh.org>

David Precious via RT wrote: Show quoted text

>binding to 127.0.0.10 instead of 127.0.0.1

That'll reduce the rate of contention, where that address is available, but doesn't fundamentally fix the race condition. Show quoted text

>Do you still see this causing issues?

I no longer work with the codebase in which I experienced the problem. -zefram

Thu Feb 18 02:24:50 2016 XSAWYERX [...] cpan.org - Correspondence added

On Mon Feb 16 06:35:17 2015, zefram@fysh.org wrote: Show quoted text

> Dancer's tests select TCP ports in a fundamentally broken manner. > Via Test::TCP and Net::EmptyPort, they select a port that appears > currently unused, and then when opening the server socket they demand > to bind it to that particular port. This is subject to an obvious > race > condition: the port may be taken by another process between its > selection > and the attempt to bind to it.

Quite. This is a race condition I'm quite surprised you've hit. The time between finding an open port and then binding to it is rather short (or should be). I believe the best way to handle this is by using Plack::Test which doesn't actually open a port. This is what Dancer2 does in its tests. The problems are, of course, that cookies are not available (simple HTTP::Request and HTTP::Response) and you need to add that logic manually with HTTP::Cookies. Dancer does not have active development. It's at a frozen state, only taking care of important breakage. Considering the priority and rarity of this (and this being a test only, not production) breakage, and how long it might take to rewrite the tests (all those using Test::TCP), we might not see this resolved any time soon.

Bug #102145 for Dancer: test port selection race condition

Preferred bug tracker