Subject: | high cpu usage when waiting for data on the destination socket |
Information:
libwww-perl-6.04
perl v5.12.2 on x86_64-linux-gnu-thread-multi
Linux wihiie 3.2.0-2-amd64 #1 SMP Sun Mar 4 22:48:17 UTC 2012 x86_64
GNU/Linux
When using LWP::UserAgent to connect to a http(s) webserver, the cpu
usage goes to 100% (per core fcourse) when waiting for data on that
socket. It's in fact an infinite loop that only quits when the socket
has an error.
Sample script:
use strict; use warnings;
use LWP::UserAgent;
my $u = LWP::UserAgent->new(
ssl_opts => {verify_hostname => 0},
);
$u->get("https://127.0.0.1:1443/BIGFILE", ':content_cb' => sub {});
This can be checked with strace, it loops only this:
read(3, 0x13c5ff3, 5) = -1 EAGAIN (Resource temporarily
unavailable)
This is easily testable by using socat and use it as a TCP proxy (event
for HTTPS/SSL). One starts a socat and background it while a sample
script is downloading data. This is of course not a real world situation
as normally, networking is ok these days. The problem is that sometimes
networking isnt' great and, morover, when the data doesn't come in *fast
enough* it also consumes too much cpu cycles. Also, a loop like that
makes that the timeout parameters are impossible to implement and
enforce.
I've found multiple of those infinite like loops:
IO::Socket::SSL has one in readline() (even multiple copy-pasted
ones)
Net::HTTP has one which does a select+timeout infinite loop while
waiting for http response headers on a persistent chunked connection.
This particular one is in LWP::Protocol::http:
409 READ:
410 {
411 $n = $socket->read_entity_body($buf, $size);
412 unless (defined $n) {
413 redo READ if $!{EINTR} || $!{EAGAIN};
414 die "read failed: $!";
415 }
416 redo READ if $n == -1;
417 }
Net::HTTP::Methods' read_entity_body() does a nonblocking read and here
it's unchecked with select().
I must admit that this might be difficult to fix in LWP::UserAgent as
this module isn't probably ment to do stuff like that. Probably because
it's next to impossible to have an eventloop like AnyEvent in
LWP::UserAgent. With AnyEvent::HTTP this is next to perfect - however,
that lacks a few other things compared to LWP::UserAgent.
It is possible to have at least all read/write's banked with a decent
select() with a correct bitmask: the fd is known there. Even adding the
timeout var is perfectly possible. Same for Net::HTTP's select+timeout
call.