Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 43507
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: sergii [...] pisem.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 5.821
Fixed in: (no value)



Subject: HTTP::Message::decoded_content fragile charset detection
HTTP::Message::decoded_content takes the last element of the $self->header("Content-Type") array, and expects it to contain the charset. I'm fetching data from a site that has Content-Type with a charset in the HTTP headers, and Content-Type without a charset in the HTML page itself: <META HTTP-EQUIV="Content-Type" CONTENT="text/html"> As a result, $self->header("Content-Type") is ('text/html; charset=windows-1251', 'text/html') and charset is not detected: DB<16> x HTTP::Headers::Util::split_header_words($self->header("Content-Type")) 0 ARRAY(0x4a78450) 0 'text/html' 1 undef 2 'charset' 3 'windows-1251' 1 ARRAY(0x7f63b01223c0) 0 'text/html' 1 undef Suggested fix: use $r->content_type instead of $self->header("Content-Type"). That's what I use as a workaround: $r->header('Content-Type' => join(';', $r->content_type)); before calling $r->decoded_content
This part has been reworked in libwww-perl-5.827. Please report back if you still find issues with how charsets are detected.