Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 46643
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: pht [...] spatium.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Encoding in header vs http-equiv
Date: Wed, 3 Jun 2009 14:24:00 +0200
To: bug-libwww-perl [...] rt.cpan.org
From: Michal Svoboda <pht [...] spatium.org>
Hi, LWP seems to prioritize the Content-type encoding from the HTML meta tag over the HTTP header when both are present and do not match. According to w3[1], it should be done the other way around. Regards, Michal Svoboda [1] http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
In what context does this happen? Do you have example code that fails?
Subject: Re: [rt.cpan.org #46643] Encoding in header vs http-equiv
Date: Mon, 15 Jun 2009 12:36:44 +0200
To: Gisle_Aas via RT <bug-libwww-perl [...] rt.cpan.org>
From: Michal Svoboda <pht [...] spatium.org>
Gisle_Aas via RT wrote 130 bytes: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46643 > > > In what context does this happen? Do you have example code that fails?
See below. With parse_head => 1 the in-html encoding of cp1250 overrides the server-sent utf8, albeit the page content is in utf8. The server-sent header should take precedence (see previous mail). #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use Encode qw/decode/; for my $hdr (qw/0 1/) { print "Parse head: $hdr\n"; my $agent = LWP::UserAgent->new(parse_head => $hdr); my $reply = $agent->get('http://www.iprima.cz'); my $cp1250 = decode('cp1250', $reply->content); my $utf = decode('utf-8', $reply->content); my $lwp = $reply->decoded_content; print join('|', $reply->header('Content-Type')), "\n"; print "cp1250: ", $lwp cmp $cp1250, "\n"; print "utf-8: ", $lwp cmp $utf, "\n"; }
I've now fixed this (twice) in the content_charset branch[1]. This will probably make it into the next release. [1] http://github.com/gisle/libwww-perl/commits/content_charset
The uploaded libwww-perl-5.827 contains these fixes.