Bug #80152 for PAR-Repository-Client: cached downloads to avoid unneeded HTTP GETs

Subject:

cached downloads to avoid unneeded HTTP GETs

If you are using a remote repository (a PAR::Repository::Client object), it's possible that each module you require will cause an HTTP GET of the par file from the repository. If you're require'ing a large number of such modules from a repo, all those GETs take too much time. Even though they're conditional GETs, so only the first one actually downloads content, the latency of all the http GETs is still significant. The attached patch caches the download result to avoid the extra HTTP GETs. -Ken

Subject:

fetch-once.diff

--- lib/PAR/Repository/Client/HTTP.pm.orig 2012-10-12 10:47:04.000000000 -0500 +++ lib/PAR/Repository/Client/HTTP.pm 2012-10-12 11:43:54.000000000 -0500 @@ -85,6 +85,7 @@ { my %escapes; + my %fetched_already; sub _fetch_file { my $self = shift; $self->{error} = undef; @@ -99,6 +100,16 @@ $local_file =~ s/([^\w\._])/$escapes{$1}/g; $local_file = File::Spec->catfile( $self->{cache_dir}, $local_file ); + # Each module you require from a repo will get you here if the + # repo is checked for that module. If you're require'ing a large + # number of such modules from a repo, all those + # LWP::Simple::mirror() GETs take too much time. Even though + # they're conditional GETs, so only the first one actually + # downloads content, the latency of all the http GETs is still + # significant. So, cache the download result to avoid + # HTTP-GET'ing the same file more than once. + return $local_file if $fetched_already{$local_file}; + my $timeout = $self->{http_timeout}; my $old_timeout = $ua->timeout(); $ua->timeout($timeout) if defined $timeout; @@ -109,7 +120,7 @@ return(); } - return $local_file if -f $local_file; + return $fetched_already{$local_file} = $local_file if -f $local_file; return(); } }