Skip Menu |

This queue is for tickets about the HTTP-Response-Encoding CPAN distribution.

Report information
The Basics
Id: 47033
Status: resolved
Priority: 0/
Queue: HTTP-Response-Encoding

People
Owner: Nobody in particular
Requestors: felix.ostmann [...] thewar.de
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.05
Fixed in: (no value)



Subject: new libwww-perl-5.827 release from 15.06.2009 breaks all tests
http://www.nntp.perl.org/group/perl.cpan.testers/2009/06/msg4189421.html All tests with the new libwww-perl failes! Cannot install Catalyst::Devel currently because of this dependency.
On Wed Jun 17 04:49:47 2009, Sadrak wrote: Show quoted text
> http://www.nntp.perl.org/group/perl.cpan.testers/2009/06/msg4189421.html > > All tests with the new libwww-perl failes! Cannot install > Catalyst::Devel currently because of this dependency.
The patch that makes the test break is <http://github.com/gisle/libwww- perl/commit/9d1902ca5dec84a4996f5605de8e7b037dd17eff>. The Content-Type header in the <meta> header of HTML documents isn't included in the HTTP::Response headers any more. HTTP::Response now provide the content_charset method that I think should be used.
On Wed Jun 17 12:39:38 2009, GAAS wrote: Show quoted text
> HTTP::Response now provide the content_charset method that I think > should be used.
Here's a patch that does exactly that.
diff --git a/Makefile.PL b/Makefile.PL index 6f575f7..9bf0947 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -12,6 +12,7 @@ WriteMakefile( PREREQ_PM => { 'Encode' => 2.00, 'Test::More' => 0, + 'HTTP::Message' => 5.827, 'HTTP::Response' => 0, }, dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, diff --git a/lib/HTTP/Response/Encoding.pm b/lib/HTTP/Response/Encoding.pm index 768aeed..95375e5 100644 --- a/lib/HTTP/Response/Encoding.pm +++ b/lib/HTTP/Response/Encoding.pm @@ -5,11 +5,7 @@ our $VERSION = sprintf "%d.%02d", q$Revision: 0.5 $ =~ /(\d+)/g; sub HTTP::Response::charset { my $self = shift; - return $self->{__charset} if exists $self->{__charset}; - my $content_type = $self->headers->header('Content-Type'); - return unless $content_type; - $content_type =~ /charset=([A-Za-z0-9_\-]+)/io; - $self->{__charset} = $1 || undef; + return $self->content_charset; } sub HTTP::Response::encoder { diff --git a/t/01-file.t b/t/01-file.t index 7203847..8eda2ea 100644 --- a/t/01-file.t +++ b/t/01-file.t @@ -55,4 +55,4 @@ my $uri = URI->new('file://'); $uri->path(File::Spec->catfile($cwd, "t", "t-null.html")); my $res = $ua->get($uri); die unless $res->is_success; -is $res->encoding, undef, "res->encoding eq undef"; +is $res->encoding, "ascii", "res->encoding is ascii";
I noticed that the docs says that the 'charset' method "Tells the charset I<exactly as appears> in the C<Content-Type:> header". This will not be true with my patch as the content_charset method will sniff the content to try to determine the charset. That's why it makes the test pass even if the charset isn't really forwarded to the header. LWP-5.827 also introduce the 'content_type_charset' method that actually "Tells the charset I<exactly as appears> in the C<Content-Type:> header", but if you do your test would not pass again.
Hi Sadrak, On Wed Jun 17 12:54:30 2009, GAAS wrote: Show quoted text
> I noticed that the docs says that the 'charset' method "Tells the > charset I<exactly as appears> > in the C<Content-Type:> header". This will not be true with my patch > as the content_charset > method will sniff the content to try to determine the charset. That's > why it makes the test pass > even if the charset isn't really forwarded to the header. > > LWP-5.827 also introduce the 'content_type_charset' method that > actually "Tells the charset > I<exactly as appears> in the C<Content-Type:> header", but if you do > your test would not pass > again.
i think your patch was mostly fine. There're two methods to determine the charset: one is looking at the HTTP header, the other is looking for in the HTML head for it. The HTTP header one is content_type_charset which should be used first in my opinion. If there's nothing set in the HTTP header the content_charset method can be used to look if there's a charset set in the HTML head. What do you guys think? Cheers, plu
diff -Nuar -r HTTP-Response-Encoding-0.05/lib/HTTP/Response/Encoding.pm HTTP-Response-Encoding-0.05.patched/lib/HTTP/Response/Encoding.pm --- HTTP-Response-Encoding-0.05/lib/HTTP/Response/Encoding.pm 2007-05-12 11:26:10.000000000 +0200 +++ HTTP-Response-Encoding-0.05.patched/lib/HTTP/Response/Encoding.pm 2009-06-19 22:31:05.000000000 +0200 @@ -6,10 +6,7 @@ sub HTTP::Response::charset { my $self = shift; return $self->{__charset} if exists $self->{__charset}; - my $content_type = $self->headers->header('Content-Type'); - return unless $content_type; - $content_type =~ /charset=([A-Za-z0-9_\-]+)/io; - $self->{__charset} = $1 || undef; + $self->{__charset} = $self->content_type_charset || $self->content_charset; } sub HTTP::Response::encoder { diff -Nuar -r HTTP-Response-Encoding-0.05/t/01-file.t HTTP-Response-Encoding-0.05.patched/t/01-file.t --- HTTP-Response-Encoding-0.05/t/01-file.t 2007-05-12 11:06:49.000000000 +0200 +++ HTTP-Response-Encoding-0.05.patched/t/01-file.t 2009-06-19 22:30:32.000000000 +0200 @@ -55,4 +55,4 @@ $uri->path(File::Spec->catfile($cwd, "t", "t-null.html")); my $res = $ua->get($uri); die unless $res->is_success; -is $res->encoding, undef, "res->encoding eq undef"; +is $res->encoding, 'ascii', "res->encoding eq undef";
The content_charset method will return the charset from the HTTP header if set before it looks at the content of the message. There is no need to call content_type_charset yourself unless you really want to suppress the sniffing at the content.
On Wed Jun 24 16:39:08 2009, GAAS wrote: Show quoted text
> The content_charset method will return the charset from the HTTP > header if set before it looks > at the content of the message. There is no need to call > content_type_charset yourself unless > you really want to suppress the sniffing at the content. >
Sorry for my belated response. I made version 0.06 so that it supports both LWP <= 5.826 and LWP >= 5.827. Dan the Maintainer Thereof