Subject: | Wrong use of sysread - EAGAIN and EINTR not handled on Unix systems |
LWP makes heavy use of sysread (Unix read) while reading the HTTP
response from the corresponding socket.
When sysread "fails" (it returns undef) this does not necessarily mean
that there was an error. This is
because sysread might fail reading from a non blocking file descriptor
(socket), which has no data available
yet but is still valid (and open). The reason for the failure is
available using perl's $! variable.
In order to avoid errors, LWP should check for the reason of a sysread
failure (using $!) and retry reading
if the failure was due to on of the errors EAGAIN or EINTR (Unix error
codes returned by read).
The current version of LWP will close the connection to the server after
sysread failed, without checking for
the reason. Therefore the response is never received and an error is
signaled to the user agent, even though the
repsonse could have been received if sysread was used correctly.
This misbehavior of LWP might have to do with the fact that on Unix
calls to select or poll can signal
that a file descriptor (socket) is ready for reading while there
actually is no data available on that
file descriptor in that very moment. Therefore any call to read
(sysread) and select on such a descriptor
should handle EAGAIN. And to make things even more stable EINTR (call
was Interrupted e.g. by a signal)
should also be handled the same way.
So in case of the http/https schema the problem is located in the file
LWP/Protocol/http.pm in the
package LWP::Protocol::http::SocketMethods and the subroutines involved
are: sysread and can_read.
Here is a non portable (Unix/POSIX only) example solution to this Problem:
...
package LWP::Protocol::http::SocketMethods;
use Errno qw(EINTR EAGAIN);
# This is in fact a blocking read which will die with the message
# 'read timeout' when the so_socket_timout expired. So the name
# sysread is misleading but this is because of the design of LWP
# therefore just remember this is a very special sysread.
sub sysread {
my $self = shift;
my $timeout = ${*$self}{io_socket_timeout};
my $stime = time if $timeout;
# This loop exists only to support error recovery upon system call
# failure caused by EAGAIN and EINTR. It will be left when either
# the timeout expired, a not recoverable error occured or sysread
# succeeded without any failure.
while ( ! $timeout || $timeout > 0 ) {
# wait for socket to become ready
# on read timeout this call will die (see can_read below).
my $ready = $self->can_read($timeout);
# In case of an error act like the real sysread an return undef,
# details about the error are available in $!.
return undef if ! defined $ready;
# If socket is not ready (there is no data available yet) then
# we just continue an try sysread.
$! = 0;
my $bytesRead = sysread($self, $_[0], $_[1], $_[2] || 0);
if ( ! defined $bytesRead && ( $! == EAGAIN || $! == EINTR) ) {
$timeout = $timeout - (time - $stime) if $timeout;
} else {
return $bytesRead;
}
}
die "read timeout";
}
sub can_read {
my($self, $timeout) = @_;
my $nfound = 0;
# This loop exists only to support error recovery upon system call
# failure caused by EAGAIN and EINTR. It will be left when either
# the timeout expired, a not recoverable error occured or select
# succeeded without any failure.
while ( ! $timeout || $timeout > 0 ) {
my $fbits = '';
vec($fbits, fileno($self), 1) = 1;
$! = 0;
my $stime = time if $timeout;
$nfound = select($fbits, undef, undef, $timeout);
if ( ! defined $nfound && ( $! == EAGAIN || $! == EINTR) ) {
# If select failed because of EAGAIN or EINTR we restart the
# call. If a timeout is set then we first have to adjust it.
$timeout = $timeout - (time - $stime) if $timeout;
} elsif ( ! defined $nfound ) {
# any other failure (error code) is signaled to the caller by
# returning undef. The caller can then inspect $! for details
# about the failure.
return undef;
} else {
return $nfound > 0;
}
}
# If wo got to this far then the read timeout expired and we have
# to signal this to the caller using an exception (die).
die('read timeout');
}
...