Subject: | HTML::HeadParser Complaints for Parsing Undecoded UTF-8 |
Hi. This is imacat from Taiwan. I got warnings when using
LWP::UserAgent on web sites with UTF-8 pages. I have tried to dig into
the code. It seems that HTML::HeadParser is not satisfied with
undecoded UTF-8 data. I do not know why HTML::HeadParser is not
satisfied. I attempted to make a patch to solve this, and the warnings
are gone. But I do not know if this patch (parsing raw undecoded UTF-8)
is a good idea. Maybe you can look into this issue.
I have attached my patch. The error log is below. Please tell me if
there is any problem. Thank you.
imacat@rinse /tmp % cat /tmp/test.pl
#! /usr/bin/perl -w
use LWP::UserAgent;
use vars qw($UA $url $r);
$UA = new LWP::UserAgent;
$url = "http://zh.wikipedia.org/";
$r = $UA->get($url);
print "$url " . $r->status_line . "\n";
imacat@rinse /tmp % /tmp/test.pl
Parsing of undecoded UTF-8 will give garbage when decoding entities at
/home/imacat/lib/perl5/LWP/Protocol.pm line 115.
http://zh.wikipedia.org/ 200 OK
imacat@rinse /tmp %