Bug #54240 for Web-Scraper: Content decoding

Wed Feb 03 00:50:06 2010 whatson [...] gmail.com - Ticket created

Subject:	Content decoding
Date:	Wed, 3 Feb 2010 15:49:19 +1000
To:	bug-Web-Scraper [...] rt.cpan.org
From:	Andrew Whatson <whatson [...] gmail.com>

Hi, I've noticed that Web::Scraper doesn't handle HTTP::Response objects with a 'content-encoding' of gzip (and presumably others as well). Poking through the code, it seems to be because an attempt is made at decoding the content manually instead of using $http_response->decoded_content, and this manual decoding checks 'content-type' but ignores 'content-encoding'. A patch is attached that removes all attempts to decode content inside Web::Scraper and instead trusts the HTTP::Response object to decode its content accurately. Thanks, Andrew

Message body is not shown because sender requested not to inline it.

Wed Feb 03 14:28:11 2010 miyagawa [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #54240] Content decoding
Date:	Wed, 3 Feb 2010 11:27:10 -0800
To:	bug-Web-Scraper [...] rt.cpan.org
From:	Tatsuhiko Miyagawa <miyagawa [...] gmail.com>

Hi, thanks for the patch. Is is possible for you to fork on github http://github.com/miyagawa/web-scraper and also add a unit test if there isn't yet? Thanks! On Tue, Feb 2, 2010 at 9:50 PM, Andrew Whatson via RT <bug-Web-Scraper@rt.cpan.org> wrote: Show quoted text

> Wed Feb 03 00:50:06 2010: Request 54240 was acted upon. > Transaction: Ticket created by whatson@gmail.com > Queue: Web-Scraper > Subject: Content decoding > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: whatson@gmail.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=54240 > > > > Hi, > > I've noticed that Web::Scraper doesn't handle HTTP::Response objects with a > 'content-encoding' of gzip (and presumably others as well). Poking through > the code, it seems to be because an attempt is made at decoding the content > manually instead of using $http_response->decoded_content, and this manual > decoding checks 'content-type' but ignores 'content-encoding'. A patch is > attached that removes all attempts to decode content inside Web::Scraper and > instead trusts the HTTP::Response object to decode its content accurately. > > Thanks, > Andrew > >

-- Tatsuhiko Miyagawa

Wed Feb 03 14:28:12 2010 The RT System itself - Status changed from 'new' to 'open'

Thu Feb 04 01:14:55 2010 MIYAGAWA [...] cpan.org - Correspondence added

Fixed in 0.32

Thu Feb 04 01:14:56 2010 MIYAGAWA [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #54240 for Web-Scraper: Content decoding

Preferred bug tracker