Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 27279
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: andrey [...] kostenko.name
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: Error determining of encoding in ASP.NET web pages
subroutine decoded_content calls method header in scalar content, so if site have two content-type headers, decoded_content will see only first. This is not a bug, but Micro$oft ASP.NET software sends two headers: Content-Type: text/html Content=Type: text/html; charset=cp1251 so LWP cannot determine encoding. I've attached a diff file, which I made to resolve this problem. It calls header it array context. And (maybe this is not good, or one more option needed) I've removed FB_CROAK() from Encode::encode's parameters, because some sites has some incorrect characters. Andrey Kostenko, software developer of Siteheart Inc. (http://kostenko.name)
Subject: 1.diff
170,177c170,174 < foreach ($self->header("Content-Type")){ < if (my @ct = HTTP::Headers::Util::split_header_words($_)) { < my %sct_param; < ($ct, undef, %sct_param) = @{$ct[-1]}; < $ct = lc($ct); < %ct_param=(%ct_param,%sct_param); < die "Can't decode multipart content" if $ct =~ m,^multipart/,; < } --- > if (my @ct = HTTP::Headers::Util::split_header_words($self->header("Content-Type"))) { > ($ct, undef, %ct_param) = @{$ct[-1]}; > $ct = lc($ct); > > die "Can't decode multipart content" if $ct =~ m,^multipart/,; 178a176 > 180a179 > 257c256 < } --- > } 273c272 < Encode::LEAVE_SRC()); --- > Encode::FB_CROAK() | Encode::LEAVE_SRC());
The charset detection logic was reworked in release 5.827. I do think this problem was fixed by that release.