Subject: | Possible race condition in POE::Driver::SysRW |
Date: | Thu, 8 Aug 2013 12:25:55 -0700 |
To: | bug-POE [...] rt.cpan.org |
From: | Nicolas Dehaine <nicko [...] phone.com> |
Hi Rocco,
I work on an application built using POE and using POEx::HTTP::Server.
I just completed investigating a problem serving medium size files (10MB) through the server, where the connection would be closed unexpectedly by POEx::HTTP::Server (mid-download, and at random times) and think I traced this down to a problem in Driver::SysRW.
The problem is complex to reproduce - it needs an active POE-based app with a good deal of I/O taking place and I can't narrow it down to simple sample code but I'm happy to provide you access to a VM if you would like to see it.
Here is my undestanding of the problem.
POEx::HTTP::Server detects the presence of the Sys::Sendfile module. If present, it uses sendfile(2) instead of write(2) when sending a file straight from disk as the HTTP response. The problem is only reproducible if Sys::Semdfile is installed.
While downloading a large file (curl client => POEx HTTP server => Sendfile => RW Wheel) , the server occasionally closes the connection. Turning on debugging in HTTP Sever, I observed that the server is receiving a 'EAGAIN' error from the ReadWrite wheel that manages the HTTP socket.
From the POE perspective, the stack is:
POEx::HTTP::Server :: sendfile() function, calling sendfile on a socket
=> Wheel::ReadWrite for socket
=> Driver::SysRW for the flushing the socket
I believe that Driver::SysRW's flush() function returns an error incorrectly.
Because of the semantics of sendfile(), there is never any data in $self->[OUTPUT_QUEUE] (line 124 of SysRW.pm). As a result, the function is executed as:
117 sub flush {
118 my ($self, $handle) = @_;
121 BEGIN { eval { require bytes } and bytes->import; }
154 $self->[TOTAL_OCTETS_LEFT];
155 }
that is, the while() loop is never entered.
As a result, the error handling done in the while loop (setting $!) is never executed (line 132 - 138).
As a result, the caller of this function will see the value of $! set by whichever code it last, rather than by the flush() function. In my case, this is ReadWrite.pm line 188-187.
184 $$driver_buffered_out_octets = $driver->flush($handle);
185
186 # When you can't write, nothing else matters.
187 if ($!) {
The value that is tested at line 187 is not set by flush().
This results in a race condition - when $! is not 0 when entering flush(), and with flush() behaving as a no-op, it returns that initial error unexpectedly, causing it to propagate back to the HTTP server, which closes the connection. In my case, the error is EAGAIN (which is not expected to be returned by flush()).
Proposed patch:
This patch addressed the issue; flush() doesn't return an error unexpectedly anymore. This seems safe, i.e. not changing the expected behavior of the function.
*** /usr/share/perl5/POE/Driver/SysRW.pm.old 2011-12-15 12:52:49.000000000 -0800
--- f/usr/share/perl5/POE/Driver/SysRW.pm 2013-08-08 12:16:54.035492000 -0700
***************
*** 120,125 ****
--- 120,128 ----
# Need to check lengths in octets, not characters.
BEGIN { eval { require bytes } and bytes->import; }
+ # Reset errno in case there is nothing to write
+ $! = 0;
+
# syswrite() it, like we're supposed to
while (@{$self->[OUTPUT_QUEUE]}) {
my $wrote_count = syswrite(
Environment
Ubuntu 12.04
$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 55 registered patches, see perl -V for more detail)
POE 1.54
Sys::Sendfile 0.11
POEx::HTTP::Server 0.0901
I am happy to help troubleshooting this further or testing alternate fixes.
Thank you,
Nicko Dehaine