Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 41488
Status: rejected
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: n1vux [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 5.10
Fixed in: (no value)



Subject: LWP mirror() and mod_gzip compression
I am reporting on behalf of the programmers on the OpenStreetMap.org project; I have not reproduced locally. They report that enabling mod_gzip for server side compression on their huge and 95% compressible XML files resulted in LWP's mirror() saving the gzipped format locally rather than having the transmission encoding decoded. (Quite reasonably the XML parser chokes on a UTF error thereafter.) Workarounds are either have caller of the LWP UA request forced-compression, or switch from mirror() to get() and then use decoded_content(). I would expect a middle ground of accepting and decoding gz without requiring it to be possible. Is this a bug in libwww ignoring mod_gzip's selected encoding, or in mod_gzip picking an unexpected encoding by ignoring the UA's restrictive accept list? One of the reportering devs reports - apm_: my version of perl is "This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi" and LWP is "This is libwww-perl-5.812" (Ubuntu Hardy) if more info is needed, if I can inquire further or attempt a minimal reproducible case.
I do think that the mirror method already does the right thing. It has no business of undoing any Content-Encoding that's specified. On the other hand it would be cool to have some pre-canned response-processing handlers available that would undo Content-Encodings on the fly. It's not there yet.
CC: Bill Ricker <bill.n1vux [...] gmail.com>
Subject: Re: [rt.cpan.org #41488] LWP mirror() and mod_gzip compression
Date: Mon, 23 Feb 2009 18:55:38 -0800 (PST)
To: bug-libwww-perl [...] rt.cpan.org
From: Bill Ricker <n1vux [...] yahoo.com>
Gisle, Thanks for all your good works for the Perl Community. I am only a relay messenger on this, since no one else on the IRC channel at the time was a serious Perl coder. I can accept that it's a feature request for an undo-transmission-encodings option for Mirror, not a bug on the default case which seems to work well for everything else. However, I think the OSMers' expectation that the default for Mirror would be to clone a directory exactly is not entirely unreasonable. The usual use case for Mirror (whether LWP or wget or...) is clone a website directory, with expectation it's as ready to serve or otherwise use as the original when done. That the server being cloned uses mod_gzip for efficiency doesn't change what the file on disk should look like, even if mod_gzip caches such. The server admins on the OSM FLOSS project found a previously reliable perl clone-and-build script failed when the source server turned on transmission compression with mod_gzip, resulting in working directory having compressed files unlike the original directory. If HTTP headers *don't* make clear whether the ZIP encoding is original or applied for transit only, that would be an underlying protocol bug out of your control. Cheers anvthanks, Bill Ricker
On Mon Feb 23 21:55:52 2009, n1vux wrote: Show quoted text
> Gisle, > > Thanks for all your good works for the Perl Community. > > I am only a relay messenger on this, since no one else on the IRC > channel at the time was a serious Perl coder. > > > I can accept that it's a feature request for an undo-transmission- > encodings option for Mirror, not a bug on the default case which > seems to work well for everything else. > > However, I think the OSMers' expectation that the default for Mirror > would be to clone a directory exactly is not entirely unreasonable. > The usual use case for Mirror (whether LWP or wget or...) is clone > a website directory, with expectation it's as ready to serve or > otherwise use as the original when done. That the server being > cloned uses mod_gzip for efficiency doesn't change what the file on > disk should look like, even if mod_gzip caches such. > > The server admins on the OSM FLOSS project found a previously reliable > perl clone-and-build script failed when the source server turned on > transmission compression with mod_gzip, resulting in working > directory having compressed files unlike the original directory. > > If HTTP headers *don't* make clear whether the ZIP encoding is > original or applied for transit only, that would be an underlying > protocol bug out of your control. >
Just for the record: I was also a "victim" of the OpenStreetMap feature to deliver gzipped content (sometimes). But I found it convenient to keep the gzipped content as is, so I only change the extension of the mirrored file to reflect this: my $resp = $ua->mirror($url, $dest); if (!$resp->is_success) { die "No success while mirroring '$url' to '$dest': " . $resp->status_line; } else { no warnings 'uninitialized'; # content-encoding header may be missing if ($resp->header('content-encoding') eq 'gzip' && $dest !~ m{\.gz$}) { rename $dest, "$dest.gz" or die "Cannot rename $dest to $dest.gz: $!"; } } NB: My preferred XML parser (XML::LibXML) can transparently handle gzipped files. Regards, Slaven