Subject: | UTF8 content not decoded |
I'm not sure if T::W::M::Catalyst is at fault for this...
My Catalyst app is emitting UTF-8 HTML with a BOM. Way upstream of
T::W::M::C I get this error:
Parsing of undecoded UTF-8 will give garbage when decoding entities at
/Users/chris/perl/lib/perl5/site_perl/5.8.6/darwin-thread-multi-2level/HTML/PullParser.pm
line 83.
I have attached a quick-and-dirty patch that resolves the issue for me.
I check for a "charset=..." in the Content-Type response header and
decode the $response->content. I've only tested this patch on
Content-Type: text/html; charset=utf-8
Maybe that check should look at meta-http content-type too if the
Content-Type lacks a charset? Or look for an XML declaration?
I suspect this patch might break if this header is set:
Content-Encoding: gzip
but that shouldn't happen under this mocked Catalyst, right?
Subject: | twmc.patch |
--- /Users/chris/perl/lib/perl5/site_perl/Test/WWW/Mechanize/Catalyst.pm 2006-06-06 01:40:30.000000000 -0500
+++ lib/Test/WWW/Mechanize/Catalyst.pm 2006-12-06 16:05:31.000000000 -0600
@@ -2,6 +2,7 @@
use strict;
use warnings;
use Test::WWW::Mechanize;
+use Encode qw();
use base qw(Test::WWW::Mechanize);
our $VERSION = "0.37";
@@ -51,6 +52,12 @@
$end_of_chain->previous($old_response); # ...and add us to it
}
+ if ($response->header('Content-Type') &&
+ $response->header('Content-Type') =~ m/charset=(\S+)/xms) {
+ my $encoding = $1;
+ $response->content(Encode::decode($encoding, $response->content()));
+ }
+
return $response;
}