I agree "max_size" could be improved. Here's another complaint about it,
and a recommended alternative implementation:
http://osdir.com/ml/lang.perl.modules.lwp/2006-04/msg00056.html
In case that link quits working, here's the posting it contains:
###
Hello,
I have a problem with the "$ua->max_size()". This can really choke you
in some cases. It seems when LWP makes this type of request it is
sending a Range request. Some servers are super slow at responding to
this type of request and often return a 206 Partial Content response.
This is sometimes replied with a "Content-Type: multipart/mixed" and a
boundary="--bla,bla,bla". This now makes it really difficult to figure
out what the content is (ie, text/html, image/gif and so on) so a lot
more processing is required to figure out what the content is and
whether or not it is acceptable. For example;
<--snip-->
my $url = '
http://search.cpan.org/';
my $max_content = 500;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->max_size($max_content);
my $response = $ua->get($url);
<--snip-->
I'm sorry I have not included a URL where all this trouble is found, but
that's because I stopped using the $ua->max_size(); some time ago, but
now I have a need for it. The problem is that some servers will take
forever to respond to this request and will often cause the above
problems mentioned. My solution to this was to create a callback instead:
<--snip-->
my $result = '';
my $url = '
http://search.cpan.org/';
my $max_content = 500;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
my $response = $ua->get($url, ':content_cb' => \&http_callback,
$max_content+1);
sub http_callback {
my ($data, $response, $protocol) = @_;
$result .= $data;
die if length($result > $max_content);
return();
}
<--snip-->
While this is not prefect, it did solve all the above issues. Servers
respond super fast and the content-type headers are untouched (ie,
text/html).
My request to you is to change the way "$ua->max_size($max_content);"
works. It would benefit me and I'm sure many others if it worked more
like the callback shown above (just stop download at (x)bytes). This
would then act more like a browser acts when you click the Stop button.
Requests would be fast and the server will reply with all header
information as expected. And this will allow us to use
"LWP::Parallel::RobotUA" which my above example will not.
So why is "$ua->max_size($max_content);" so useful? Well some people
like to create terabyte files and feed it to the robot just to see if
they can crash the server. Using the current
"$ua->max_size($max_content);" slows everything way down and comes with
the extra baggage of a 206 response header and a multipart/mixed
content-type. The callback solves all these issues, but will not work
with "LWP::Parallel".
Thanks for listening,
John