Subject: | HTTP::Message::decoded_content fragile charset detection |
HTTP::Message::decoded_content takes the last element of the
$self->header("Content-Type") array, and expects it to contain the charset.
I'm fetching data from a site that has Content-Type with a charset in
the HTTP headers, and Content-Type without a charset in the HTML page
itself:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html">
As a result, $self->header("Content-Type") is ('text/html;
charset=windows-1251', 'text/html') and charset is not detected:
DB<16> x
HTTP::Headers::Util::split_header_words($self->header("Content-Type"))
0 ARRAY(0x4a78450)
0 'text/html'
1 undef
2 'charset'
3 'windows-1251'
1 ARRAY(0x7f63b01223c0)
0 'text/html'
1 undef
Suggested fix: use $r->content_type instead of
$self->header("Content-Type").
That's what I use as a workaround:
$r->header('Content-Type' => join(';', $r->content_type));
before calling $r->decoded_content