Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 17368
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: rg [...] progtech.net
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: PATCH: add option to allow bad encoding in decoded_content
If a webserver returns badly encoded data (bad utf8), I still need to be able to get it decoded. Please add an option to ignore problems, e.g like with the attached patch. IMHO, since that's the default in Encode, ignoring a bad encoding might be the better default. Also, maybe a way for setting a global default error mode in Encode would be the better solution. However, I don't think that's for me to decide and I'd be happy if my patch makes it into the distribution. Thanks, Rolf.
Subject: lwp.diff
--- Message.pm.orig Tue Sep 20 20:32:16 2005 +++ Message.pm Mon Jan 30 20:47:13 2006 @@ -269,7 +269,7 @@ $content_ref_iscopy++; } $content_ref = \Encode::decode($charset, $$content_ref, - Encode::FB_CROAK() | Encode::LEAVE_SRC()); + $opt{ignore_encode_failure} ? Encode::FB_DEFAULT() : Encode::FB_CROAK() | Encode::LEAVE_SRC()); } } };
I applied the following patch: diff --git a/lib/HTTP/Message.pm b/lib/HTTP/Message.pm index 73293cc..3899568 100644 --- a/lib/HTTP/Message.pm +++ b/lib/HTTP/Message.pm @@ -281,7 +281,7 @@ sub decoded_content $content_ref_iscopy++; } $content_ref = \Encode::decode($charset, $$content_ref, - Encode::FB_CROAK() | Encode::LEAVE_SRC()); + ($opt{charset_strict} ? Encode::FB_CROAK() : 0) | Encode::LEAVE_SRC()); } } }; @@ -609,6 +609,12 @@ C<none> can used to suppress decoding of the charset. This override the default charset of "ISO-8859-1". +=item C<charset_strict> + +Abort decoding when if malformed characters is found in the content. By +default you get the substitution character ("\x{FFFD}") in place of +mailformed characters. + =item C<raise_error> If TRUE then raise an exception if not able to decode content. Reason diff --git a/t/base/message.t b/t/base/message.t index c2a5577..0494474 100644 --- a/t/base/message.t +++ b/t/base/message.t @@ -356,6 +356,17 @@ skip($] < 5.008 || ($Config::Config{'extensions'} !~ /\bEncode\b/) ok($@ || "", ""); ok($m->content, $tmp); +$m->remove_header("Content-Encoding"); +$m->content("a\xFF"); + +skip($] < 5.008 || ($Config::Config{'extensions'} !~ /\bEncode\b/) + ? "No Encode module" : "", + sub { $m->decoded_content }, "a\x{FFFD}"); + +skip($] < 5.008 || ($Config::Config{'extensions'} !~ /\bEncode\b/) + ? "No Encode module" : "", + sub { $m->decoded_content(charset_strict => 1) }, undef); + $m->header("Content-Encoding", "foobar"); ok($m->decoded_content, undef); ok($@ =~ /^Don't know how to decode Content-Encoding 'foobar'/);