Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 17208
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: zyzstar [...] uid0.sk
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 5.805
Fixed in: (no value)



Subject: max_size is broken
"max_size" feature in LWP::UserAgent class is broken. It's because UserAgent sets a "Range" header to 0-($max_size-1) which causes that HTTP server always just sends the first chunk of data with size <= max_size and a response code "206 Partial Content". Then there is no way to distinguish from the response code whether the received data are complete or partial, because server sends "206" even if the first range contained a complete response. (Also a "Client-Aborted" header is never set if the server was able to send the partial content). Example: ---------------------------------------- HTTP/1.1 206 Partial Content..Date: Sun, 22 Jan 2006 18:47:24 GMT.. Server: Apache/1.3.33 (Unix) PHP/5.1.0 mod_ssl/2.8.22 OpenSSL/0.9.7e..Last- Modified : Wed, 21 May 2003 15:35:25 GMT..ETag: "53c09a-47aa0-3ecb9cbd".. Accept-Rang es: bytes..Content-Length: 293536..Content-Range: bytes 0-293535/ 293536..Co nnection: close..Content-Type: image/jpeg ---------------------------------------- Another problem is, that "max_size" check in LWP::Protocol is missing for reading a data into callback. Regards, Jozef
Subject: max_size is broken (and a suggestion for improvement)
I agree "max_size" could be improved. Here's another complaint about it, and a recommended alternative implementation: http://osdir.com/ml/lang.perl.modules.lwp/2006-04/msg00056.html In case that link quits working, here's the posting it contains: ### Hello, I have a problem with the "$ua->max_size()". This can really choke you in some cases. It seems when LWP makes this type of request it is sending a Range request. Some servers are super slow at responding to this type of request and often return a 206 Partial Content response. This is sometimes replied with a "Content-Type: multipart/mixed" and a boundary="--bla,bla,bla". This now makes it really difficult to figure out what the content is (ie, text/html, image/gif and so on) so a lot more processing is required to figure out what the content is and whether or not it is acceptable. For example; <--snip--> my $url = 'http://search.cpan.org/'; my $max_content = 500; require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->max_size($max_content); my $response = $ua->get($url); <--snip--> I'm sorry I have not included a URL where all this trouble is found, but that's because I stopped using the $ua->max_size(); some time ago, but now I have a need for it. The problem is that some servers will take forever to respond to this request and will often cause the above problems mentioned. My solution to this was to create a callback instead: <--snip--> my $result = ''; my $url = 'http://search.cpan.org/'; my $max_content = 500; require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); my $response = $ua->get($url, ':content_cb' => \&http_callback, $max_content+1); sub http_callback { my ($data, $response, $protocol) = @_; $result .= $data; die if length($result > $max_content); return(); } <--snip--> While this is not prefect, it did solve all the above issues. Servers respond super fast and the content-type headers are untouched (ie, text/html). My request to you is to change the way "$ua->max_size($max_content);" works. It would benefit me and I'm sure many others if it worked more like the callback shown above (just stop download at (x)bytes). This would then act more like a browser acts when you click the Stop button. Requests would be fast and the server will reply with all header information as expected. And this will allow us to use "LWP::Parallel::RobotUA" which my above example will not. So why is "$ua->max_size($max_content);" so useful? Well some people like to create terabyte files and feed it to the robot just to see if they can crash the server. Using the current "$ua->max_size($max_content);" slows everything way down and comes with the extra baggage of a 206 response header and a multipart/mixed content-type. The callback solves all these issues, but will not work with "LWP::Parallel". Thanks for listening, John