Subject: | test port selection race condition |
Date: | Mon, 16 Feb 2015 11:35:00 +0000 |
To: | bug-Dancer [...] rt.cpan.org |
From: | Zefram <zefram [...] fysh.org> |
Dancer's tests select TCP ports in a fundamentally broken manner.
Via Test::TCP and Net::EmptyPort, they select a port that appears
currently unused, and then when opening the server socket they demand
to bind it to that particular port. This is subject to an obvious race
condition: the port may be taken by another process between its selection
and the attempt to bind to it. For example:
bind to 0.0.0.0:50890: Address already in use at /opt/perl-5.18.2/lib/site_perl/
5.18.2/HTTP/Server/Simple.pm line 732, <DATA> line 16.
HTTP::Server::Simple::setup_listener(Dancer::Handler::Standalone=HASH(0xf281c0)) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/HTTP/Server/Simple.pm line 299
HTTP::Server::Simple::run(Dancer::Handler::Standalone=HASH(0xf281c0)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer/Handler/Standalone.pm line 39
Dancer::Handler::Standalone::start(Dancer::Handler::Standalone=HASH(0x154b078)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer/Handler.pm line 212
Dancer::Handler::dance(Dancer::Handler::Standalone=HASH(0x154b078)) called at /opt/perl-5.18.2/cpan/build/Dancer-1.3132-eJfDF9/blib/lib/Dancer.pm line 494
Dancer::_start("Dancer") called at t/03_route_handler/34_forward_body_post.t line 46
main::__ANON__(50890) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 87
Test::TCP::start(Test::TCP=HASH(0x14cbd30)) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 69
Test::TCP::new("Test::TCP", "code", CODE(0x18166a0), "port", 50890) called at /opt/perl-5.18.2/lib/site_perl/5.18.2/Test/TCP.pm line 31
Test::TCP::test_tcp("client", CODE(0x8ee2f0), "server", CODE(0x18166a0)) called at t/03_route_handler/34_forward_body_post.t line 48
I have an automatic daily build of Perl and a bunch of CPAN modules,
and if one module fails then it breaks that day's build, so intermittent
false failures matter. That's the context in which the above occurred.
The tests should instead leave port selection to the kernel. Bind the
socket without specifying a port in the system call, and then find out
what port it's actually bound to. Obviously, an additional pipe will
be required to report the port from the server process back to the main
test process. Or alternatively you could detect the EADDRINUSE failure
and retry the test when that occurs.
-zefram